From jameskasskrv at gmail.com Thu Sep 10 07:59:40 2020 From: jameskasskrv at gmail.com (James Kass) Date: Thu, 10 Sep 2020 12:59:40 +0000 Subject: Grantha encoding question Message-ID: In preliminary Grantha encoding proposals there were characters which don't seem to have been accepted into the Standard. What is the recommended encoding for the Grantha letters NNNA, RRA, and LLLA/ZHA?? (Originally proposed for U+11329, U+11331, and U+11334 respectively.) Here's a link to one of the earlier proposals: https://www.unicode.org/L2/L2010/10426-grantha-proposal.pdf A graphic is attached showing the three characters. -------------- next part -------------- A non-text attachment was scrubbed... Name: Grantha_20200910.jpg Type: image/jpeg Size: 84587 bytes Desc: not available URL: From cibucj at gmail.com Thu Sep 10 08:21:01 2020 From: cibucj at gmail.com (Cibu) Date: Thu, 10 Sep 2020 14:21:01 +0100 Subject: Grantha encoding question In-Reply-To: References: Message-ID: I am curious whether you would see them in Grantha texts. If yes, could you provide some examples? On Thu, Sep 10, 2020 at 2:02 PM James Kass via Unicode wrote: > > In preliminary Grantha encoding proposals there were characters which > don't seem to have been accepted into the Standard. > > What is the recommended encoding for the Grantha letters NNNA, RRA, and > LLLA/ZHA? (Originally proposed for U+11329, U+11331, and U+11334 > respectively.) > > Here's a link to one of the earlier proposals: > https://www.unicode.org/L2/L2010/10426-grantha-proposal.pdf > > A graphic is attached showing the three characters. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jameskasskrv at gmail.com Fri Sep 11 05:58:01 2020 From: jameskasskrv at gmail.com (James Kass) Date: Fri, 11 Sep 2020 10:58:01 +0000 Subject: Grantha encoding question In-Reply-To: References: Message-ID: <8dc91000-7b61-d27d-1b04-be6d85667fe0@gmail.com> Thank you.? Your question seems to have answered mine. If nobody can produce exhibits, then it makes sense that these characters weren't included in the Standard. Meanwhile I've removed these glyphs and their associated ligatures from the font I'm trying to update.? If they ever get proposed and accepted, the glyphs can be reinstated. On 2020-09-10 1:21 PM, Cibu wrote: > I am curious whether you would see them in Grantha texts. If yes, could you > provide some examples? > > On Thu, Sep 10, 2020 at 2:02 PM James Kass via Unicode > wrote: > >> In preliminary Grantha encoding proposals there were characters which >> don't seem to have been accepted into the Standard. >> >> What is the recommended encoding for the Grantha letters NNNA, RRA, and >> LLLA/ZHA? (Originally proposed for U+11329, U+11331, and U+11334 >> respectively.) >> >> Here's a link to one of the earlier proposals: >> https://www.unicode.org/L2/L2010/10426-grantha-proposal.pdf >> >> A graphic is attached showing the three characters. >> From samjnaa at gmail.com Fri Sep 11 08:09:46 2020 From: samjnaa at gmail.com (Shriramana Sharma) Date: Fri, 11 Sep 2020 18:39:46 +0530 Subject: Grantha encoding question In-Reply-To: <8dc91000-7b61-d27d-1b04-be6d85667fe0@gmail.com> References: <8dc91000-7b61-d27d-1b04-be6d85667fe0@gmail.com> Message-ID: On 9/11/20, James Kass via Unicode wrote: > Thank you. Your question seems to have answered mine. > > If nobody can produce exhibits, then it makes sense that these > characters weren't included in the Standard. Please see L2/12-039. These aren't often used but somewhat rarely texts and may be considered a borrowal. A mechanism to use characters from the Tamil and Grantha block to be judiciously intermixed is required. As demonstrated by L2/20-119 there are occasional intermixing of letters from other scripts. The normal Unicode approach is to encode what has been attested. Occasionally some other factors recommend against encoding. Nevertheless a mechanism to represent in plaintext is necessary. > > Meanwhile I've removed these glyphs and their associated ligatures from > the font I'm trying to update. If they ever get proposed and accepted, > the glyphs can be reinstated. > > On 2020-09-10 1:21 PM, Cibu wrote: >> I am curious whether you would see them in Grantha texts. If yes, could >> you >> provide some examples? >> >> On Thu, Sep 10, 2020 at 2:02 PM James Kass via Unicode >> >> wrote: >> >>> In preliminary Grantha encoding proposals there were characters which >>> don't seem to have been accepted into the Standard. >>> >>> What is the recommended encoding for the Grantha letters NNNA, RRA, and >>> LLLA/ZHA? (Originally proposed for U+11329, U+11331, and U+11334 >>> respectively.) >>> >>> Here's a link to one of the earlier proposals: >>> https://www.unicode.org/L2/L2010/10426-grantha-proposal.pdf >>> >>> A graphic is attached showing the three characters. >>> > > -- Shriramana Sharma ???????????? ???????????? ???????????? From costello at mitre.org Mon Sep 14 14:46:07 2020 From: costello at mitre.org (Roger L Costello) Date: Mon, 14 Sep 2020 19:46:07 +0000 Subject: A file contains text data and binary data ... Is it a text file or a binary file? Message-ID: Hi Folks, A file contains a long series of text data and at the end is binary data. The binary data is not encoded as base64 text or anything like that. It is raw, unfiltered, unencoded binary data. Is it a text file or a binary file? A colleague argues that it may be legitimately treated as a text file. After all, it can be opened in a text editor. The text editor might display odd-looking characters such as this: ??? T But that is harmless. Is there a practical, real-world problem with treating it as a text file? /Roger From markus.icu at gmail.com Mon Sep 14 14:55:37 2020 From: markus.icu at gmail.com (Markus Scherer) Date: Mon, 14 Sep 2020 12:55:37 -0700 Subject: A file contains text data and binary data ... Is it a text file or a binary file? In-Reply-To: References: Message-ID: I think many people would call that a binary file with an internal structure where the first part is text and the second part is binary. I suspect there is a "magic sequence" of some kind as a separator. On Mon, Sep 14, 2020 at 12:48 PM Roger L Costello via Unicode < unicode at unicode.org> wrote: > Is there a practical, real-world problem with treating it as a text file? > Depends on what you do. For some stuff, it will "work" or be harmless. In other cases, tools will barf at you because they tried to validate it as, say, UTF-8 text and found errors. Or the result may not be useful. What if you count the number of lines of "text" and the tool gives you arbitrary results based on what bytes it happens to find in the binary part? Why do you need to give it a single attribute of "text" or "binary"? It's a bit like asking about the single language of a text that contains paragraphs in different languages. markus -------------- next part -------------- An HTML attachment was scrubbed... URL: From doug at ewellic.org Mon Sep 14 15:08:23 2020 From: doug at ewellic.org (Doug Ewell) Date: Mon, 14 Sep 2020 14:08:23 -0600 Subject: A file contains text data and binary data ... Is it a text file or a binary file? In-Reply-To: References: Message-ID: <000701d68ad2$cd687ab0$68397010$@ewellic.org> Roger L Costello wrote: > A file contains a long series of text data and at the end is binary > data. The binary data is not encoded as base64 text or anything like > that. It is raw, unfiltered, unencoded binary data. > > Is it a text file or a binary file? I seem to remember a question about this distinction some months ago. IIRC it devolved into a discussion about LF versus CRLF and "text mode" versus "binary mode" file transfers, which of course is not what is (or was) being asked. > A colleague argues that it may be legitimately treated as a text file. > After all, it can be opened in a text editor. You can try to open any file inside a text editor. What the text editor does, display it in a meaningful way or show binary garbage or decline to open, is another matter. > The text editor might display odd-looking characters such as this: > > ??? T > > But that is harmless. As long as any changes the text editor might make are not saved back to the file. Many text editors reformat tabs into spaces or vice versa, remove trailing spaces in lines, convert between LF and CRLF, and so forth. If the file is not intended to be text for human consumption, these can be breaking changes; if the file has arbitrary binary content, they will be. > Is there a practical, real-world problem with treating it as a text > file? Depends on what "treating" means. And I still think this is a false dichotomy. -- Doug Ewell, CC, ALB | Thornton, CO, US | ewellic.org From harjitmoe at outlook.com Mon Sep 14 15:13:00 2020 From: harjitmoe at outlook.com (Harriet Riddle) Date: Mon, 14 Sep 2020 20:13:00 +0000 Subject: A file contains text data and binary data ... Is it a text file or a binary file? In-Reply-To: References: Message-ID: In answer to your second question: Loading and saving as a text file is only guaranteed, per the current C spec, to round-trip it if it contains only printing characters, native line breaks, horizontal tabs, and spaces which are not immediately followed by line breaks. Some editors will round trip, some might truncate at a NUL (e.g. leafpad) and/or ^Z when loading, some will corrupt it by newline conversion (e.g. reading both CRLF and LF as LF), and so forth. (? ^Z is normally SUB, but DOS encodings by their IBM/ICU mappings pivot FS?SUB?DEL for some reason, so it arguably counts as FS here?) So if you intend to preserve the binary data, loading it as text is not advised. (It may also confound encoding detection, or throw an error in a strict-mode UTF-8 reader, et cetera, and as such potentially complicate the ability to read non-ASCII text, depending on the details of your case.) And yes, a binary file can contain text file segments while still being a binary file, but not the other way around (a tar archive is always a binary file, and most ar archives are also, but an ar (.a) archive of only UTF-8 text files is itself arguably a UTF-8 text file?though it usually isn't treated as such since this would only work in that special case, while treating ar as a binary format like tar works always). --Har. Get Outlook for Android ________________________________ From: Unicode on behalf of Roger L Costello via Unicode Sent: Monday, September 14, 2020 8:46:07 PM To: Unicode Discussion Subject: A file contains text data and binary data ... Is it a text file or a binary file? Hi Folks, A file contains a long series of text data and at the end is binary data. The binary data is not encoded as base64 text or anything like that. It is raw, unfiltered, unencoded binary data. Is it a text file or a binary file? A colleague argues that it may be legitimately treated as a text file. After all, it can be opened in a text editor. The text editor might display odd-looking characters such as this: ??? T But that is harmless. Is there a practical, real-world problem with treating it as a text file? /Roger -------------- next part -------------- An HTML attachment was scrubbed... URL: From costello at mitre.org Mon Sep 14 16:19:22 2020 From: costello at mitre.org (Roger L Costello) Date: Mon, 14 Sep 2020 21:19:22 +0000 Subject: A file contains text data and binary data ... Is it a text file or a binary file? In-Reply-To: References: Message-ID: Hi Folks, Thank you for your outstanding responses! I am a bit confused about something that Marcus said: Why do you need to give it a single attribute of "text" or "binary"? It's a bit like asking about the single language of a text that contains paragraphs in different languages. And Doug said: I think this is a false dichotomy. -------------------------------------------- I realize there are binary files that contain text. For example, EXE files contain binary data with islands of text scattered here and there. But the key point, I think, is that EXE is categorized as a binary file and not a text file. And thus EXE files should be displayed/edited using an appropriate hex editor, not a text editor. Based on the many potential problems you described with using a text editor to display a file that contains text data and binary data, I draw this conclusion: If a file contains binary data, it is a binary file; only if the file contains purely text data is it a text file." Do you agree with this conclusion? /Roger From steffen at sdaoden.eu Mon Sep 14 16:38:05 2020 From: steffen at sdaoden.eu (Steffen Nurpmeso) Date: Mon, 14 Sep 2020 23:38:05 +0200 Subject: A file contains text data and binary data ... Is it a text file or a binary file? In-Reply-To: References: Message-ID: <20200914213805.qZjqm%steffen@sdaoden.eu> Roger L Costello wrote in : |Thank you for your outstanding responses! | |I am a bit confused about something that Marcus said: | |Why do you need to give it a single attribute of "text" or "binary"? \ |It's a bit like asking about the single language of a text that contains \ |paragraphs in different languages. | |And Doug said: | |I think this is a false dichotomy. |-------------------------------------------- |I realize there are binary files that contain text. For example, EXE \ |files contain binary data with islands of text scattered here and there. \ |But the key point, I think, is that EXE is categorized as a binary \ |file and not a text file. And thus EXE files should be displayed/edited \ |using an appropriate hex editor, not a text editor. | |Based on the many potential problems you described with using a text \ |editor to display a file that contains text data and binary data, I \ |draw this conclusion: If a file contains binary data, it is a binary \ |file; only if the file contains purely text data is it a text file." \ |Do you agree with this conclusion? POSIX has the definitions (resorted) 2729 3.403 Text File 2730 A file that contains characters organized into zero or more lines. The lines do not contain NUL 2731 characters and none can exceed {LINE_MAX} bytes in length, including the 2732 character. Although POSIX.1-2017 does not distinguish between text files and binary files (see 2733 the ISO C standard), many utilities only produce predictable or meaningful output when 2734 operating on text files. The standard utilities that have such restrictions always specify ``text 2735 files?? in their STDIN or INPUT FILES sections. 2284 3.288 Printable File 2285 A text file consisting only of the characters included in the print and space character 2286 classifications of the LC_CTYPE category and the , all in the current locale. 2287 Note: The LC_CTYPE category is defined in detail in Section 7.3.1 (on page 139). For the mailer i maintain the MIME classification checks for NUL and other control characters, but does not treat as binary /* If there is a escape sequence in reverse solidus notation defined * for this in ANSI X3.159-1989 (ANSI C89), do not treat it as * a control for real. I.e., \a=\x07=BEL, \b=\x08=BS, \t=\x09=HT. * Do not follow libmagic(1) in respect to \v=\x0B=VT. \f=\x0C=NP; do * ignore \e=\x1B=ESC */ if((c >= '\x07' && c <= '\x0D') || c == '\x1B') continue; Plus carriage-return and newline. The above only 8-bit, ASCII compatible. --steffen | |Der Kragenbaer, The moon bear, |der holt sich munter he cheerfully and one by one |einen nach dem anderen runter wa.ks himself off |(By Robert Gernhardt) From doug at ewellic.org Mon Sep 14 17:32:50 2020 From: doug at ewellic.org (Doug Ewell) Date: Mon, 14 Sep 2020 16:32:50 -0600 Subject: A file contains text data and binary data ... Is it a text file or a binary file? In-Reply-To: References: Message-ID: <000c01d68ae6$fabbc850$f03358f0$@ewellic.org> Roger L Costello wrote: > Based on the many potential problems you described with using a text > editor to display a file that contains text data and binary data, I > draw this conclusion: If a file contains binary data, it is a binary > file; only if the file contains purely text data is it a text file." > Do you agree with this conclusion? Sure. I mean, it's as close as you're probably going to get. I used a hex editor on a known text file just this morning. And POSIX definitions notwithstanding, there are text files with lines of arbitrary length, or in encodings that are not bytewise extensions of ASCII. So if you want a solid, one-sentence definition of "text" versus "binary" that covers all scenarios and doesn't take into account what humans would consider "text" and "not text," prepare for that one sentence to have a lot of clauses. -- Doug Ewell, CC, ALB | Thornton, CO, US | ewellic.org From costello at mitre.org Wed Sep 16 05:43:41 2020 From: costello at mitre.org (Roger L Costello) Date: Wed, 16 Sep 2020 10:43:41 +0000 Subject: Why is the "<" symbol named the "less-than sign"? Message-ID: Hi Folks, Just curious ... I realize that mathematicians use the "<" symbol to denote the less-than relation. Who decided that that symbol would denote the less-than relation? The "<" symbol looks like a "v" turned sideways. How does a sideways "v" symbol connote less-than? The "<" symbol is used in other places where it has nothing to do with less-than. For example, it is used in HTML and XML such as this: I suspect there are many places where it is used and has nothing to do with less-than. So why is the "<" symbol named the "less-than sign"? /Roger From richard.wordingham at ntlworld.com Wed Sep 16 06:33:43 2020 From: richard.wordingham at ntlworld.com (Richard Wordingham) Date: Wed, 16 Sep 2020 12:33:43 +0100 Subject: Why is the "<" symbol named the "less-than sign"? In-Reply-To: References: Message-ID: <20200916123343.47a6da76@JRWUBU2> On Wed, 16 Sep 2020 10:43:41 +0000 Roger L Costello via Unicode wrote: > I realize that mathematicians use the "<" symbol to denote the > less-than relation. Who decided that that symbol would denote the > less-than relation? Wikipedia says Thomas Harriot, who died in 1621, possibly killed by tobacco. (The book with the symbol in was published in 1631.) You may find https://en.wikipedia.org/wiki/Table_of_mathematical_symbols_by_introduction_date interesting. > So why is the "<" symbol named the "less-than sign"? Because that was the use of the symbol that the namers learnt in school. What symbol did you first learn for that meaning? Richard. From marius.spix at web.de Wed Sep 16 06:35:57 2020 From: marius.spix at web.de (Marius Spix) Date: Wed, 16 Sep 2020 13:35:57 +0200 Subject: Aw: Why is the "<" symbol named the "less-than sign"? In-Reply-To: References: Message-ID: An HTML attachment was scrubbed... URL: From costello at mitre.org Wed Sep 16 07:01:51 2020 From: costello at mitre.org (Roger L Costello) Date: Wed, 16 Sep 2020 12:01:51 +0000 Subject: Why is the "<" symbol named the "less-than sign"? Message-ID: S?amas made a really interesting remark: I always thought it was obvious, or at least fairly intuitive. A < B: A is less than B; A > B: A is greater than B. Me too! I've been taught (brainwashed?) from childhood to interpret "<" as less-than. But yesterday when I reflected on it, it occurred to me that there is nothing at all obvious or intuitive about using "<" to denote less-than. Why would two non-parallel lines terminating at the meeting point denote less-than? Stated another way, why would a "v" turned sideways denote less-than? I guess someone (French mathematician Pierre Bouguer, British logician John Wallis) just made it up. /Roger From mark at kli.org Wed Sep 16 07:10:19 2020 From: mark at kli.org (Mark E. Shoulson) Date: Wed, 16 Sep 2020 08:10:19 -0400 Subject: Why is the "<" symbol named the "less-than sign"? In-Reply-To: References: Message-ID: <1348977e-ffc2-61a3-a0c1-51064ef53f64@kli.org> An HTML attachment was scrubbed... URL: From samjnaa at gmail.com Wed Sep 16 07:13:53 2020 From: samjnaa at gmail.com (Shriramana Sharma) Date: Wed, 16 Sep 2020 17:43:53 +0530 Subject: Why is the "<" symbol named the "less-than sign"? In-Reply-To: References: Message-ID: When I was a child and first taught < and > at school, I figured that they were derived from the equals sign =, except that the bigger number has the bigger separation between the lines and the smaller number has the smaller separation becoming none. So I do see meaning in it. And it was obviously named the less than sign long before it was used for XML tags. -- Shriramana Sharma ???????????? ???????????? ???????????? From karl-pentzlin at acssoft.de Wed Sep 16 07:36:11 2020 From: karl-pentzlin at acssoft.de (Karl Pentzlin) Date: Wed, 16 Sep 2020 14:36:11 +0200 Subject: Why is the "<" symbol named the "less-than sign"? In-Reply-To: References: Message-ID: <1991410671.20200916143611@acssoft.de> Am Mittwoch, 16. September 2020 um 12:43 schrieb Roger L Costello via Unicode: RLCvU> Hi Folks, RLCvU> Just curious ... RLCvU> I realize that mathematicians use the "<" symbol to denote the RLCvU> less-than relation. Who decided that that symbol would denote the less-than relation? According to the German Wikipedia: https://de.wikipedia.org/wiki/Vergleichszeichen#Geschichte the symbols were devised 1631 by Thomas Harriot in his book "Artis analyticae praxis". (Until now, the English wikipedia lacks this information.) RLCvU> The "<" symbol looks like a "v" turned sideways. How does a RLCvU> sideways "v" symbol connote less-than? It is not derived from a "v", as this picture shows (scan from the aforementioned book): https://commons.wikimedia.org/wiki/File:Artis_analyticae_praxis_1631_%E2%80%93_Comparationis_signa_in_sequentibus_usurpanda.png RLCvU> The "<" symbol is used in other places where it has nothing to RLCvU> do with less-than. For example, it is used in HTML and XML such as this: RLCvU> I suspect there are many places where it is used and has nothing to do with less-than. RLCvU> So why is the "<" symbol named the "less-than sign"? Presumably, as the other uses are recent and motivated by the fact that some new features had to be implemented by the then available ASCII characters, which then already had their name. - Karl RLCvU> /Roger From andrea.giammarchi at gmail.com Wed Sep 16 07:37:27 2020 From: andrea.giammarchi at gmail.com (Andrea Giammarchi) Date: Wed, 16 Sep 2020 14:37:27 +0200 Subject: Why is the "<" symbol named the "less-than sign"? In-Reply-To: References: Message-ID: easy to visualize too, same for me too ... it's actually semantic, if you don't think it's a "v" but an equal = that rotates axes to indicate 0 < 1, 1 = 1, 2 > 1 ? also, what's the meaning of winking? (kidding) best regards On Wed, Sep 16, 2020 at 2:19 PM Shriramana Sharma via Unicode < unicode at unicode.org> wrote: > When I was a child and first taught < and > at school, I figured that > they were derived from the equals sign =, except that the bigger > number has the bigger separation between the lines and the smaller > number has the smaller separation becoming none. So I do see meaning > in it. > > And it was obviously named the less than sign long before it was used > for XML tags. > > -- > Shriramana Sharma ???????????? ???????????? ???????????? > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From richard.wordingham at ntlworld.com Wed Sep 16 08:07:34 2020 From: richard.wordingham at ntlworld.com (Richard Wordingham) Date: Wed, 16 Sep 2020 14:07:34 +0100 Subject: Why is the "<" symbol named the "less-than sign"? In-Reply-To: <1991410671.20200916143611@acssoft.de> References: <1991410671.20200916143611@acssoft.de> Message-ID: <20200916140734.1e3e3890@JRWUBU2> On Wed, 16 Sep 2020 14:36:11 +0200 Karl Pentzlin via Unicode wrote: > According to the German Wikipedia: > https://de.wikipedia.org/wiki/Vergleichszeichen#Geschichte > the symbols were devised 1631 by Thomas Harriot in his book "Artis > analyticae praxis". > (Until now, the English wikipedia lacks this information.) I gave an English Wikipedia reference earlier in this thread. Richard. From samjnaa at gmail.com Wed Sep 16 08:25:42 2020 From: samjnaa at gmail.com (Shriramana Sharma) Date: Wed, 16 Sep 2020 18:55:42 +0530 Subject: Why is the "<" symbol named the "less-than sign"? In-Reply-To: References: Message-ID: There's a less than symbol in that ? as well. On Wed, 16 Sep, 2020, 18:07 Andrea Giammarchi, wrote: > easy to visualize too, same for me too ... it's actually semantic, if you > don't think it's a "v" but an equal = that rotates axes to indicate 0 < 1, > 1 = 1, 2 > 1 ? > > also, what's the meaning of winking? (kidding) > > best regards > > On Wed, Sep 16, 2020 at 2:19 PM Shriramana Sharma via Unicode < > unicode at unicode.org> wrote: > >> When I was a child and first taught < and > at school, I figured that >> they were derived from the equals sign =, except that the bigger >> number has the bigger separation between the lines and the smaller >> number has the smaller separation becoming none. So I do see meaning >> in it. >> >> And it was obviously named the less than sign long before it was used >> for XML tags. >> >> -- >> Shriramana Sharma ???????????? ???????????? ???????????? >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From andrea.giammarchi at gmail.com Wed Sep 16 08:28:17 2020 From: andrea.giammarchi at gmail.com (Andrea Giammarchi) Date: Wed, 16 Sep 2020 15:28:17 +0200 Subject: Why is the "<" symbol named the "less-than sign"? In-Reply-To: References: Message-ID: What your right eye sees can't compete with what the outer world actually is ? (emoji with no less-than on purpose) On Wed, Sep 16, 2020 at 3:22 PM Shriramana Sharma wrote: > There's a less than symbol in that ? as well. > > On Wed, 16 Sep, 2020, 18:07 Andrea Giammarchi, < > andrea.giammarchi at gmail.com> wrote: > >> easy to visualize too, same for me too ... it's actually semantic, if you >> don't think it's a "v" but an equal = that rotates axes to indicate 0 < 1, >> 1 = 1, 2 > 1 ? >> >> also, what's the meaning of winking? (kidding) >> >> best regards >> >> On Wed, Sep 16, 2020 at 2:19 PM Shriramana Sharma via Unicode < >> unicode at unicode.org> wrote: >> >>> When I was a child and first taught < and > at school, I figured that >>> they were derived from the equals sign =, except that the bigger >>> number has the bigger separation between the lines and the smaller >>> number has the smaller separation becoming none. So I do see meaning >>> in it. >>> >>> And it was obviously named the less than sign long before it was used >>> for XML tags. >>> >>> -- >>> Shriramana Sharma ???????????? ???????????? ???????????? >>> >>> -------------- next part -------------- An HTML attachment was scrubbed... URL: From john.w.kennedy at gmail.com Wed Sep 16 09:10:37 2020 From: john.w.kennedy at gmail.com (John W Kennedy) Date: Wed, 16 Sep 2020 10:10:37 -0400 Subject: Fwd: Why is the "<" symbol named the "less-than sign"? References: Message-ID: I don?t know what Harriot was thinking, but the most obvious reading (one taught, as nearly as I can recall after so many decades, in American schools) is that in a. As to its use in SGML and XML, it was a replacement for : in IBM?s original GML, where, for example, a paragraph was delimited thus: :p. ... :ep. : was changed to <, e (for ?end?) was changed to /, and . was changed to >. The characters used, of course, had to be available in ASCII. I dare say early publications on SGML include a rationale. -- John W. Kennedy "Compact is becoming contract, Man only earns and pays." -- Charles Williams. "Bors to Elayne: On the King's Coins" > On Sep 16, 2020, at 6:47 AM, Roger L Costello via Unicode wrote: > > ?Hi Folks, > > Just curious ... > > I realize that mathematicians use the "<" symbol to denote the less-than relation. Who decided that that symbol would denote the less-than relation? > > The "<" symbol looks like a "v" turned sideways. How does a sideways "v" symbol connote less-than? > > The "<" symbol is used in other places where it has nothing to do with less-than. For example, it is used in HTML and XML such as this: > > I suspect there are many places where it is used and has nothing to do with less-than. > > So why is the "<" symbol named the "less-than sign"? > > /Roger > > > > >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From gwidion at gmail.com Wed Sep 16 10:11:52 2020 From: gwidion at gmail.com (Joao S. O. Bueno) Date: Wed, 16 Sep 2020 12:11:52 -0300 Subject: Why is the "<" symbol named the "less-than sign"? In-Reply-To: References: Message-ID: Of course they are different things - but when you create characters to use in computers with a 7 bit restriction, back in the 60-70's, they had to choose to overlap things. Nowadays, in a "pretty text" context, it makes sense that the tools automatically transform the symbols to the one that is semantically more correct - everyon is familiar with Microsoft Word and others changing single quotes to curly quotes, with open/close context; But when one talks of a file format that is meant to be possibly typed by a human and read by a computer, like xml, and others, the extra "translation layer", and the physical limitation of symbols in a keyboard makes it not worth to use what would be a "correct" character in every language. Thus, "less-than" doubles as "Left angle bracket" in all computer-scripts or languages that use it in that way. (I am quite sure the above paragraph will sound childish for people on this list, and I'd love to see a more formal rewrite of the same idea) But the unicode standard surely have the characters for every use (I produced the following output using my own tool, sorry if it is not the most readable, but I think it conveys enough information to understand the case). Character(code=0x3008, value='?', name='LEFT ANGLE BRACKET', category='Ps', width='W'), Character(code=0x300A, value='?', name='LEFT DOUBLE ANGLE BRACKET', category='Ps', width='W'), Character(code=0x2991, value='?', name='LEFT ANGLE BRACKET WITH DOT', category='Ps', width='N'), Character(code=0x27E8, value='?', name='MATHEMATICAL LEFT ANGLE BRACKET', category='Ps', width='Na'), Character(code=0x27EA, value='?', name='MATHEMATICAL LEFT DOUBLE ANGLE BRACKET', category='Ps', width='Na'), Character(code=0x276C, value='?', name='MEDIUM LEFT-POINTING ANGLE BRACKET ORNAMENT', category='Ps', width='N'), Character(code=0x2770, value='?', name='HEAVY LEFT-POINTING ANGLE BRACKET ORNAMENT', category='Ps', width='N'), Character(code=0xFE3D, value='?', name='PRESENTATION FORM FOR VERTICAL LEFT DOUBLE ANGLE BRACKET', category='Ps', wid th='W'), And then: Character(code=0xFE3F, value='?', name='PRESENTATION FORM FOR VERTICAL LEFT ANGLE BRACKET', category='Ps', width='W') Character(code=0x003C, value='<', name='LESS-THAN SIGN', category='Sm', width='Na'), Character(code=0x2264, value='?', name='LESS-THAN OR EQUAL TO', category='Sm', width='A'), Character(code=0x2266, value='?', name='LESS-THAN OVER EQUAL TO', category='Sm', width='A'), Character(code=0x2268, value='?', name='LESS-THAN BUT NOT EQUAL TO', category='Sm', width='N'), Character(code=0x226A, value='?', name='MUCH LESS-THAN', category='Sm', width='A'), Character(code=0x226E, value='?', name='NOT LESS-THAN', category='Sm', width='A'), Character(code=0x2270, value='?', name='NEITHER LESS-THAN NOR EQUAL TO', category='Sm', width='N'), Character(code=0x2272, value='?', name='LESS-THAN OR EQUIVALENT TO', category='Sm', width='N'), Character(code=0x2274, value='?', name='NEITHER LESS-THAN NOR EQUIVALENT TO', category='Sm', width='N'), Character(code=0x2276, value='?', name='LESS-THAN OR GREATER-THAN', category='Sm', width='N'), Character(code=0x2277, value='?', name='GREATER-THAN OR LESS-THAN', category='Sm', width='N'), Character(code=0x2278, value='?', name='NEITHER LESS-THAN NOR GREATER-THAN', category='Sm', width='N'), Character(code=0x2279, value='?', name='NEITHER GREATER-THAN NOR LESS-THAN', category='Sm', width='N'), Character(code=0x22D6, value='?', name='LESS-THAN WITH DOT', category='Sm', width='N'), Character(code=0x22D8, value='?', name='VERY MUCH LESS-THAN', category='Sm', width='N'), Character(code=0x22DA, value='?', name='LESS-THAN EQUAL TO OR GREATER-THAN', category='Sm', width='N'), Character(code=0x22DB, value='?', name='GREATER-THAN EQUAL TO OR LESS-THAN', category='Sm', width='N'), Character(code=0x22DC, value='?', name='EQUAL TO OR LESS-THAN', category='Sm', width='N'), Character(code=0x22E6, value='?', name='LESS-THAN BUT NOT EQUIVALENT TO', category='Sm', width='N'), Character(code=0x2343, value='?', name='APL FUNCTIONAL SYMBOL QUAD LESS-THAN', category='So', width='N'), Character(code=0x2976, value='?', name='LESS-THAN ABOVE LEFTWARDS ARROW', category='Sm', width='N'), Character(code=0x2977, value='?', name='LEFTWARDS ARROW THROUGH LESS-THAN', category='Sm', width='N'), Character(code=0x2993, value='?', name='LEFT ARC LESS-THAN BRACKET', category='Ps', width='N'), Character(code=0x2996, value='?', name='DOUBLE RIGHT ARC LESS-THAN BRACKET', category='Pe', width='N'), Character(code=0x29C0, value='?', name='CIRCLED LESS-THAN', category='Sm', width='N'), Character(code=0x2A79, value='?', name='LESS-THAN WITH CIRCLE INSIDE', category='Sm', width='N'), Character(code=0x2A7B, value='?', name='LESS-THAN WITH QUESTION MARK ABOVE', category='Sm', width='N'), Character(code=0x2A7D, value='?', name='LESS-THAN OR SLANTED EQUAL TO', category='Sm', width='N'), Character(code=0x2A7F, value='?', name='LESS-THAN OR SLANTED EQUAL TO WITH DOT INSIDE', category='Sm', width='N'), Character(code=0x2A81, value='?', name='LESS-THAN OR SLANTED EQUAL TO WITH DOT ABOVE', category='Sm', width='N'), Character(code=0x2A83, value='?', name='LESS-THAN OR SLANTED EQUAL TO WITH DOT ABOVE RIGHT', category='Sm', width='N') , Character(code=0x2A85, value='?', name='LESS-THAN OR APPROXIMATE', category='Sm', width='N'), Character(code=0x2A87, value='?', name='LESS-THAN AND SINGLE-LINE NOT EQUAL TO', category='Sm', width='N'), Character(code=0x2A89, value='?', name='LESS-THAN AND NOT APPROXIMATE', category='Sm', width='N'), Character(code=0x2A8B, value='?', name='LESS-THAN ABOVE DOUBLE-LINE EQUAL ABOVE GREATER-THAN', category='Sm', width='N '), Character(code=0x2A8C, value='?', name='GREATER-THAN ABOVE DOUBLE-LINE EQUAL ABOVE LESS-THAN', category='Sm', width='N '), Character(code=0x2A8D, value='?', name='LESS-THAN ABOVE SIMILAR OR EQUAL', category='Sm', width='N'), Character(code=0x2A8F, value='?', name='LESS-THAN ABOVE SIMILAR ABOVE GREATER-THAN', category='Sm', width='N'), Character(code=0x2A90, value='?', name='GREATER-THAN ABOVE SIMILAR ABOVE LESS-THAN', category='Sm', width='N'), Character(code=0x2A91, value='?', name='LESS-THAN ABOVE GREATER-THAN ABOVE DOUBLE-LINE EQUAL', category='Sm', width='N '), Character(code=0x2A92, value='?', name='GREATER-THAN ABOVE LESS-THAN ABOVE DOUBLE-LINE EQUAL', category='Sm', width='N '), Character(code=0x2A93, value='?', name='LESS-THAN ABOVE SLANTED EQUAL ABOVE GREATER-THAN ABOVE SLANTED EQUAL', categor y='Sm', width='N'), Character(code=0x2A94, value='?', name='GREATER-THAN ABOVE SLANTED EQUAL ABOVE LESS-THAN ABOVE SLANTED EQUAL', categor y='Sm', width='N'), Character(code=0x2A95, value='?', name='SLANTED EQUAL TO OR LESS-THAN', category='Sm', width='N'), Character(code=0x2A97, value='?', name='SLANTED EQUAL TO OR LESS-THAN WITH DOT INSIDE', category='Sm', width='N'), Character(code=0x2A99, value='?', name='DOUBLE-LINE EQUAL TO OR LESS-THAN', category='Sm', width='N'), Character(code=0x2A9B, value='?', name='DOUBLE-LINE SLANTED EQUAL TO OR LESS-THAN', category='Sm', width='N'), Character(code=0x2A9D, value='?', name='SIMILAR OR LESS-THAN', category='Sm', width='N'), Character(code=0x2A9F, value='?', name='SIMILAR ABOVE LESS-THAN ABOVE EQUALS SIGN', category='Sm', width='N'), Character(code=0x2AA1, value='?', name='DOUBLE NESTED LESS-THAN', category='Sm', width='N'), Character(code=0x2AA3, value='?', name='DOUBLE NESTED LESS-THAN WITH UNDERBAR', category='Sm', width='N'), Character(code=0x2AA4, value='?', name='GREATER-THAN OVERLAPPING LESS-THAN', category='Sm', width='N'), Character(code=0x2AA5, value='?', name='GREATER-THAN BESIDE LESS-THAN', category='Sm', width='N'), Character(code=0x2AA6, value='?', name='LESS-THAN CLOSED BY CURVE', category='Sm', width='N'), Character(code=0x2AA8, value='?', name='LESS-THAN CLOSED BY CURVE ABOVE SLANTED EQUAL', category='Sm', width='N'), Character(code=0x2AF7, value='?', name='TRIPLE NESTED LESS-THAN', category='Sm', width='N'), Character(code=0x2AF9, value='?', name='DOUBLE-LINE SLANTED LESS-THAN OR EQUAL TO', category='Sm', width='N'), Character(code=0xFE64, value='?', name='SMALL LESS-THAN SIGN', category='Sm', width='W'), Character(code=0xFF1C, value='?', name='FULLWIDTH LESS-THAN SIGN', category='Sm', width='F'), Character(code=0xE003C, value='', name='TAG LESS-THAN SIGN', category='Cf', width='N') On Wed, 16 Sep 2020 at 10:34, Andrea Giammarchi via Unicode < unicode at unicode.org> wrote: > What your right eye sees can't compete with what the outer world actually > is ? > > (emoji with no less-than on purpose) > > On Wed, Sep 16, 2020 at 3:22 PM Shriramana Sharma > wrote: > >> There's a less than symbol in that ? as well. >> >> On Wed, 16 Sep, 2020, 18:07 Andrea Giammarchi, < >> andrea.giammarchi at gmail.com> wrote: >> >>> easy to visualize too, same for me too ... it's actually semantic, if >>> you don't think it's a "v" but an equal = that rotates axes to indicate 0 < >>> 1, 1 = 1, 2 > 1 ? >>> >>> also, what's the meaning of winking? (kidding) >>> >>> best regards >>> >>> On Wed, Sep 16, 2020 at 2:19 PM Shriramana Sharma via Unicode < >>> unicode at unicode.org> wrote: >>> >>>> When I was a child and first taught < and > at school, I figured that >>>> they were derived from the equals sign =, except that the bigger >>>> number has the bigger separation between the lines and the smaller >>>> number has the smaller separation becoming none. So I do see meaning >>>> in it. >>>> >>>> And it was obviously named the less than sign long before it was used >>>> for XML tags. >>>> >>>> -- >>>> Shriramana Sharma ???????????? ???????????? ???????????? >>>> >>>> -------------- next part -------------- An HTML attachment was scrubbed... URL: From gwidion at gmail.com Wed Sep 16 10:18:11 2020 From: gwidion at gmail.com (Joao S. O. Bueno) Date: Wed, 16 Sep 2020 12:18:11 -0300 Subject: Why is the "<" symbol named the "less-than sign"? In-Reply-To: References: Message-ID: As a more concrete example of why human-writable-computer-readable files must keep using a reduced subset of similar looking characters can be seen in the discussion below, that took place along ~50 e-mail messages in a mailing list to discuss improvements to the Python language. https://mail.python.org/archives/list/python-ideas at python.org/thread/ILMNJ46EAL4ENYK7LLDLGIMYQKZAMMWU/#ILMNJ46EAL4ENYK7LLDLGIMYQKZAMMWU On Wed, 16 Sep 2020 at 12:11, Joao S. O. Bueno wrote: > Of course they are different things - but when you create characters to > use in computers with > a 7 bit restriction, back in the 60-70's, they had to choose to overlap > things. > > Nowadays, in a "pretty text" context, it makes sense that the tools > automatically transform > the symbols to the one that is semantically more correct - everyon is > familiar with > Microsoft Word and others changing single quotes to curly quotes, with > open/close context; > > But when one talks of a file format that is meant to be possibly typed by > a human and read by > a computer, like xml, and others, the extra "translation layer", and the > physical limitation > of symbols in a keyboard makes it not worth to use what would be a > "correct" character > in every language. Thus, "less-than" doubles as "Left angle bracket" in > all computer-scripts > or languages that use it in that way. > (I am quite sure the above paragraph will sound childish for people on > this list, and I'd love > to see a more formal rewrite of the same idea) > > But the unicode standard surely have the characters for every use > (I produced the following output using my own tool, sorry if > it is not the most readable, but I think it conveys enough > information to understand the case). > > > Character(code=0x3008, value='?', name='LEFT ANGLE BRACKET', > category='Ps', width='W'), > Character(code=0x300A, value='?', name='LEFT DOUBLE ANGLE BRACKET', > category='Ps', width='W'), > Character(code=0x2991, value='?', name='LEFT ANGLE BRACKET WITH DOT', > category='Ps', width='N'), > Character(code=0x27E8, value='?', name='MATHEMATICAL LEFT ANGLE BRACKET', > category='Ps', width='Na'), > Character(code=0x27EA, value='?', name='MATHEMATICAL LEFT DOUBLE ANGLE > BRACKET', category='Ps', width='Na'), > Character(code=0x276C, value='?', name='MEDIUM LEFT-POINTING ANGLE > BRACKET ORNAMENT', category='Ps', width='N'), > Character(code=0x2770, value='?', name='HEAVY LEFT-POINTING ANGLE BRACKET > ORNAMENT', category='Ps', width='N'), > Character(code=0xFE3D, value='?', name='PRESENTATION FORM FOR VERTICAL > LEFT DOUBLE ANGLE BRACKET', category='Ps', wid > th='W'), > > And then: > Character(code=0xFE3F, value='?', name='PRESENTATION FORM FOR VERTICAL > LEFT ANGLE BRACKET', category='Ps', width='W') > Character(code=0x003C, value='<', name='LESS-THAN SIGN', category='Sm', > width='Na'), > Character(code=0x2264, value='?', name='LESS-THAN OR EQUAL TO', > category='Sm', width='A'), > Character(code=0x2266, value='?', name='LESS-THAN OVER EQUAL TO', > category='Sm', width='A'), > Character(code=0x2268, value='?', name='LESS-THAN BUT NOT EQUAL TO', > category='Sm', width='N'), > Character(code=0x226A, value='?', name='MUCH LESS-THAN', category='Sm', > width='A'), > Character(code=0x226E, value='?', name='NOT LESS-THAN', category='Sm', > width='A'), > Character(code=0x2270, value='?', name='NEITHER LESS-THAN NOR EQUAL TO', > category='Sm', width='N'), > Character(code=0x2272, value='?', name='LESS-THAN OR EQUIVALENT TO', > category='Sm', width='N'), > Character(code=0x2274, value='?', name='NEITHER LESS-THAN NOR EQUIVALENT > TO', category='Sm', width='N'), > Character(code=0x2276, value='?', name='LESS-THAN OR GREATER-THAN', > category='Sm', width='N'), > Character(code=0x2277, value='?', name='GREATER-THAN OR LESS-THAN', > category='Sm', width='N'), > Character(code=0x2278, value='?', name='NEITHER LESS-THAN NOR > GREATER-THAN', category='Sm', width='N'), > Character(code=0x2279, value='?', name='NEITHER GREATER-THAN NOR > LESS-THAN', category='Sm', width='N'), > Character(code=0x22D6, value='?', name='LESS-THAN WITH DOT', > category='Sm', width='N'), > Character(code=0x22D8, value='?', name='VERY MUCH LESS-THAN', > category='Sm', width='N'), > Character(code=0x22DA, value='?', name='LESS-THAN EQUAL TO OR > GREATER-THAN', category='Sm', width='N'), > Character(code=0x22DB, value='?', name='GREATER-THAN EQUAL TO OR > LESS-THAN', category='Sm', width='N'), > Character(code=0x22DC, value='?', name='EQUAL TO OR LESS-THAN', > category='Sm', width='N'), > Character(code=0x22E6, value='?', name='LESS-THAN BUT NOT EQUIVALENT TO', > category='Sm', width='N'), > Character(code=0x2343, value='?', name='APL FUNCTIONAL SYMBOL QUAD > LESS-THAN', category='So', width='N'), > Character(code=0x2976, value='?', name='LESS-THAN ABOVE LEFTWARDS ARROW', > category='Sm', width='N'), > Character(code=0x2977, value='?', name='LEFTWARDS ARROW THROUGH > LESS-THAN', category='Sm', width='N'), > Character(code=0x2993, value='?', name='LEFT ARC LESS-THAN BRACKET', > category='Ps', width='N'), > Character(code=0x2996, value='?', name='DOUBLE RIGHT ARC LESS-THAN > BRACKET', category='Pe', width='N'), > Character(code=0x29C0, value='?', name='CIRCLED LESS-THAN', category='Sm', > width='N'), > Character(code=0x2A79, value='?', name='LESS-THAN WITH CIRCLE INSIDE', > category='Sm', width='N'), > Character(code=0x2A7B, value='?', name='LESS-THAN WITH QUESTION MARK > ABOVE', category='Sm', width='N'), > Character(code=0x2A7D, value='?', name='LESS-THAN OR SLANTED EQUAL TO', > category='Sm', width='N'), > Character(code=0x2A7F, value='?', name='LESS-THAN OR SLANTED EQUAL TO WITH > DOT INSIDE', category='Sm', width='N'), > Character(code=0x2A81, value='?', name='LESS-THAN OR SLANTED EQUAL TO WITH > DOT ABOVE', category='Sm', width='N'), > Character(code=0x2A83, value='?', name='LESS-THAN OR SLANTED EQUAL TO WITH > DOT ABOVE RIGHT', category='Sm', width='N') > , > Character(code=0x2A85, value='?', name='LESS-THAN OR APPROXIMATE', > category='Sm', width='N'), > Character(code=0x2A87, value='?', name='LESS-THAN AND SINGLE-LINE NOT > EQUAL TO', category='Sm', width='N'), > Character(code=0x2A89, value='?', name='LESS-THAN AND NOT APPROXIMATE', > category='Sm', width='N'), > Character(code=0x2A8B, value='?', name='LESS-THAN ABOVE DOUBLE-LINE EQUAL > ABOVE GREATER-THAN', category='Sm', width='N > '), > Character(code=0x2A8C, value='?', name='GREATER-THAN ABOVE DOUBLE-LINE > EQUAL ABOVE LESS-THAN', category='Sm', width='N > '), > Character(code=0x2A8D, value='?', name='LESS-THAN ABOVE SIMILAR OR EQUAL', > category='Sm', width='N'), > Character(code=0x2A8F, value='?', name='LESS-THAN ABOVE SIMILAR ABOVE > GREATER-THAN', category='Sm', width='N'), > Character(code=0x2A90, value='?', name='GREATER-THAN ABOVE SIMILAR ABOVE > LESS-THAN', category='Sm', width='N'), > Character(code=0x2A91, value='?', name='LESS-THAN ABOVE GREATER-THAN ABOVE > DOUBLE-LINE EQUAL', category='Sm', width='N > '), > Character(code=0x2A92, value='?', name='GREATER-THAN ABOVE LESS-THAN ABOVE > DOUBLE-LINE EQUAL', category='Sm', width='N > '), > Character(code=0x2A93, value='?', name='LESS-THAN ABOVE SLANTED EQUAL > ABOVE GREATER-THAN ABOVE SLANTED EQUAL', categor > y='Sm', width='N'), > Character(code=0x2A94, value='?', name='GREATER-THAN ABOVE SLANTED EQUAL > ABOVE LESS-THAN ABOVE SLANTED EQUAL', categor > y='Sm', width='N'), > Character(code=0x2A95, value='?', name='SLANTED EQUAL TO OR LESS-THAN', > category='Sm', width='N'), > Character(code=0x2A97, value='?', name='SLANTED EQUAL TO OR LESS-THAN WITH > DOT INSIDE', category='Sm', width='N'), > Character(code=0x2A99, value='?', name='DOUBLE-LINE EQUAL TO OR > LESS-THAN', category='Sm', width='N'), > Character(code=0x2A9B, value='?', name='DOUBLE-LINE SLANTED EQUAL TO OR > LESS-THAN', category='Sm', width='N'), > Character(code=0x2A9D, value='?', name='SIMILAR OR LESS-THAN', > category='Sm', width='N'), > Character(code=0x2A9F, value='?', name='SIMILAR ABOVE LESS-THAN ABOVE > EQUALS SIGN', category='Sm', width='N'), > Character(code=0x2AA1, value='?', name='DOUBLE NESTED LESS-THAN', > category='Sm', width='N'), > Character(code=0x2AA3, value='?', name='DOUBLE NESTED LESS-THAN WITH > UNDERBAR', category='Sm', width='N'), > Character(code=0x2AA4, value='?', name='GREATER-THAN OVERLAPPING > LESS-THAN', category='Sm', width='N'), > Character(code=0x2AA5, value='?', name='GREATER-THAN BESIDE LESS-THAN', > category='Sm', width='N'), > Character(code=0x2AA6, value='?', name='LESS-THAN CLOSED BY CURVE', > category='Sm', width='N'), > Character(code=0x2AA8, value='?', name='LESS-THAN CLOSED BY CURVE ABOVE > SLANTED EQUAL', category='Sm', width='N'), > Character(code=0x2AF7, value='?', name='TRIPLE NESTED LESS-THAN', > category='Sm', width='N'), > Character(code=0x2AF9, value='?', name='DOUBLE-LINE SLANTED LESS-THAN OR > EQUAL TO', category='Sm', width='N'), > Character(code=0xFE64, value='?', name='SMALL LESS-THAN SIGN', > category='Sm', width='W'), > Character(code=0xFF1C, value='?', name='FULLWIDTH LESS-THAN SIGN', > category='Sm', width='F'), > Character(code=0xE003C, value='', name='TAG LESS-THAN SIGN', > category='Cf', width='N') > > On Wed, 16 Sep 2020 at 10:34, Andrea Giammarchi via Unicode < > unicode at unicode.org> wrote: > >> What your right eye sees can't compete with what the outer world actually >> is ? >> >> (emoji with no less-than on purpose) >> >> On Wed, Sep 16, 2020 at 3:22 PM Shriramana Sharma >> wrote: >> >>> There's a less than symbol in that ? as well. >>> >>> On Wed, 16 Sep, 2020, 18:07 Andrea Giammarchi, < >>> andrea.giammarchi at gmail.com> wrote: >>> >>>> easy to visualize too, same for me too ... it's actually semantic, if >>>> you don't think it's a "v" but an equal = that rotates axes to indicate 0 < >>>> 1, 1 = 1, 2 > 1 ? >>>> >>>> also, what's the meaning of winking? (kidding) >>>> >>>> best regards >>>> >>>> On Wed, Sep 16, 2020 at 2:19 PM Shriramana Sharma via Unicode < >>>> unicode at unicode.org> wrote: >>>> >>>>> When I was a child and first taught < and > at school, I figured that >>>>> they were derived from the equals sign =, except that the bigger >>>>> number has the bigger separation between the lines and the smaller >>>>> number has the smaller separation becoming none. So I do see meaning >>>>> in it. >>>>> >>>>> And it was obviously named the less than sign long before it was used >>>>> for XML tags. >>>>> >>>>> -- >>>>> Shriramana Sharma ???????????? ???????????? ???????????? >>>>> >>>>> -------------- next part -------------- An HTML attachment was scrubbed... URL: From jtauber at jtauber.com Wed Sep 16 12:58:23 2020 From: jtauber at jtauber.com (James Tauber) Date: Thu, 17 Sep 2020 01:58:23 +0800 Subject: Why is the "<" symbol named the "less-than sign"? In-Reply-To: References: Message-ID: On Wed, Sep 16, 2020 at 10:17 PM John W Kennedy via Unicode < unicode at unicode.org> wrote: > > : was changed to <, e (for ?end?) was changed to /, and . was changed to > >. The characters used, of course, had to be available in ASCII. I dare say > early publications on SGML include a rationale. > Had to be available in ASCII, less likely to occur in character content, and the < > pair visually suggests the open/close of markup (and alludes to the older written convention of circling markup to distinguish it from content). James -------------- next part -------------- An HTML attachment was scrubbed... URL: From marius.spix at web.de Fri Sep 18 04:10:32 2020 From: marius.spix at web.de (Marius Spix) Date: Fri, 18 Sep 2020 11:10:32 +0200 Subject: Is there a Greek version of the INTERROBANG? Message-ID: An HTML attachment was scrubbed... URL: From kenwhistler at sonic.net Fri Sep 18 09:24:24 2020 From: kenwhistler at sonic.net (Ken Whistler) Date: Fri, 18 Sep 2020 07:24:24 -0700 Subject: Is there a Greek version of the INTERROBANG? In-Reply-To: References: Message-ID: ????? ?? ?? ?????????????;! --Ken On 9/18/2020 2:10 AM, Marius Spix via Unicode wrote: > I wonder if there is a Greek version of U+203D INTERROBANG. Greek uses > U+037E GREEK QUESTION MARK (or more common U+003B SEMICOLON) instead > of U+003F QUESTION MARK. So you could compose an exclamation mark with > a comma instead of a dot with U+0049 LATIN CAPITAL LETTER I and U+0326 > COMBINING COMMA BELOW, but it still does not look right and I am not > even sure, if there is any evidence for such a character. -------------- next part -------------- An HTML attachment was scrubbed... URL: From doug at ewellic.org Fri Sep 18 09:39:10 2020 From: doug at ewellic.org (Doug Ewell) Date: Fri, 18 Sep 2020 08:39:10 -0600 Subject: Is there a Greek version of the INTERROBANG? In-Reply-To: References: Message-ID: <004201d68dc9$7903bf80$6b0b3e80$@ewellic.org> Marius Spix wrote: > So you could compose an exclamation mark with a comma instead of a dot > with U+0049 LATIN CAPITAL LETTER I and U+0326 COMBINING COMMA BELOW, > but it still does not look right That's because it isn't right. The exclamation mark is not derived from a capital I. > and I am not even sure, if there is any evidence for such a character. https://en.wikipedia.org/wiki/Punctuation#%22Question_comma%22,_%22exclamation_comma%22 There are actually books about punctuation. One, which I regrettably do not yet own, is "Shady Characters: The Secret Life of Punctuation, Symbols, and Other Typographical Marks," by Keith Houston. Such works might be useful for answering this question and the earlier one about the origin of the less-than sign. -- Doug Ewell, CC, ALB | Thornton, CO, US | ewellic.org From haberg-1 at telia.com Fri Sep 18 11:35:43 2020 From: haberg-1 at telia.com (=?utf-8?Q?Hans_=C3=85berg?=) Date: Fri, 18 Sep 2020 18:35:43 +0200 Subject: Is there a Greek version of the INTERROBANG? In-Reply-To: References: Message-ID: <11E20A72-BA07-464C-A9AA-DA0FBCFB2E52@telia.com> > On 18 Sep 2020, at 16:24, Ken Whistler via Unicode wrote: > > ????? ?? ?? ?????????????;! In chess commenting, there is difference between "?!", a questionable move, and "!?", an interesting move. The interrobang is neutral with respect such a distinction (though not used in chess). https://en.wikipedia.org/wiki/Chess_annotation_symbols https://en.wikipedia.org/wiki/Interrobang From wjgo_10009 at btinternet.com Fri Sep 18 13:10:59 2020 From: wjgo_10009 at btinternet.com (wjgo_10009 at btinternet.com) Date: Fri, 18 Sep 2020 19:10:59 +0100 (BST) Subject: L2/20-213 Hand with palm facing up and Hand with palm facing down for Unicode 14.0 Message-ID: <1ced1458.7e1.174a2698a5e.Webtop.48@btinternet.com> I refer to the following document. https://www.unicode.org/L2/L2020/20213-palms-up-down-emoji.pdf Hand with palm facing up and Hand with palm facing down for Unicode 14.0 The meaning of the proposed emoji 'Hand with palm facing down' is fine. My own experience is that this is quite formal. For example, as in the following. ?Good evening ma?am, may I have the pleasure of this dance?? and offers his right hand, palm down, as if in a formal ballroom setting. The meaning of the proposed emoji 'Hand with palm facing up' as "drop, go away, drop it, put down" does not correspond with my own personal experience, though the concept mentioned later in the document of "Palm up can indicate a lack of knowledge" does in the sense of "Who knows!", though I do not understand quite what "Palm up can indicate a lack of knowledge cross-linguistically" means. But my lack of experience of the meanings stated in the document is no reason whatsoever not to encode the proposed meanings for that hand gesture. However, I am thinking that the proposed 'Hand with palm?facing up' could be renamed as 'Hand with palm?facing up with fingers upward' and a third emoji 'Hand with palm facing up with fingers downward' added. For me, 'Hand with palm facing up with fingers downward' is a common gesture, such as inviting a visitor to home or office (in antepandemicum times and hopefully in the future) to sit down and make himself or herself comfortable, or to indicate "after you" when two lanes of road traffic are merging at road works, or "please proceed" when letting a car from a side road into queued road traffic. The custom being that the other driver raises his or her hand in acknowledgement and thanks. For another example, going into a restaurant (in antepandemicum times and hopefully in the future) early evening when all of the trade at that time of day appears to be take-aways, and asking if one can have a sit down meal at present (as one is going direct from work to an evening institute meeting) and the manager indicating 'yes certainly' in speech and by a palm up fingers downward gesture towards the empty seated area of the restaurant. So should there be three new emoji for these hand gestures rather than just the two in the proposal? William Overington Friday 18 September 2020 From abrahamgross at disroot.org Fri Sep 18 15:31:30 2020 From: abrahamgross at disroot.org (abrahamgross at disroot.org) Date: Fri, 18 Sep 2020 20:31:30 +0000 (UTC) Subject: Is there a Greek version of the INTERROBANG? In-Reply-To: <004201d68dc9$7903bf80$6b0b3e80$@ewellic.org> References: <004201d68dc9$7903bf80$6b0b3e80$@ewellic.org> Message-ID: <307d090e-a88e-45f4-9ff9-7c2d1d1a983e@disroot.org> Not to bash on "Shady Characters: The Secret Life of Punctuation, Symbols, and Other Typographical Marks", but it doesn't really have much information past the what you can find on the wikipedia articles of the punctuation marks it talks about. also a lot of the information isn't researched thoroughly enough for a satisfying history of the given punctuation mark. Like in the asterisk chapter the author talks about how ? U+2051 TWO ASTERISKS ALIGNED VERTICALLY is buried in the depths of unicode for some unknown reason, when it was used in the past for specific reasons (which is why unicode added it) 2020/09/18 ??10:40:44 Doug Ewell via Unicode : > Marius Spix wrote: > >> So you could compose an exclamation mark with a comma instead of a dot >> with U+0049 LATIN CAPITAL LETTER I and U+0326 COMBINING COMMA BELOW, >> but it still does not look right > > That's because it isn't right. The exclamation mark is not derived from a capital I. > >> and I am not even sure, if there is any evidence for such a character. > > https://en.wikipedia.org/wiki/Punctuation#%22Question_comma%22,_%22exclamation_comma%22 > > There are actually books about punctuation. One, which I regrettably do not yet own, is "Shady Characters: The Secret Life of Punctuation, Symbols, and Other Typographical Marks," by Keith Houston. Such works might be useful for answering this question and the earlier one about the origin of the less-than sign. > > -- > Doug Ewell, CC, ALB | Thornton, CO, US | ewellic.org > From doug at ewellic.org Mon Sep 21 08:51:34 2020 From: doug at ewellic.org (Doug Ewell) Date: Mon, 21 Sep 2020 07:51:34 -0600 Subject: Is there a Greek version of the INTERROBANG? Message-ID: <004401d6901e$51b4cb70$f51e6250$@ewellic.org> Hans ?berg wrote: > In chess commenting, there is difference between "?!", a questionable > move, and "!?", an interesting move. The interrobang is neutral with > respect such a distinction (though not used in chess). Those would be ? and ? (and ? and ?), not ?, although I imagine chess publications don't use any of those either. Interrobang must have one of the highest "talked about" versus "actually used" ratios of any encoded character. -- Doug Ewell, CC, ALB | Thornton, CO, US | ewellic.org From doug at ewellic.org Mon Sep 21 08:52:08 2020 From: doug at ewellic.org (Doug Ewell) Date: Mon, 21 Sep 2020 07:52:08 -0600 Subject: Is there a Greek version of the INTERROBANG? In-Reply-To: <307d090e-a88e-45f4-9ff9-7c2d1d1a983e@disroot.org> References: <004201d68dc9$7903bf80$6b0b3e80$@ewellic.org> <307d090e-a88e-45f4-9ff9-7c2d1d1a983e@disroot.org> Message-ID: <004501d6901e$66888550$33998ff0$@ewellic.org> abrahamgross at disroot.org wrote: > Not to bash on "Shady Characters: The Secret Life of Punctuation, > Symbols, and Other Typographical Marks", but it doesn't really have > much information past the what you can find on the wikipedia articles > of the punctuation marks it talks about. also a lot of the information > isn't researched thoroughly enough for a satisfying history of the > given punctuation mark. As I said, I don't have the book. Maybe I'll take a pass on it. -- Doug Ewell, CC, ALB | Thornton, CO, US | ewellic.org From everson at evertype.com Mon Sep 21 09:14:39 2020 From: everson at evertype.com (Michael Everson) Date: Mon, 21 Sep 2020 15:14:39 +0100 Subject: Is there a Greek version of the INTERROBANG? In-Reply-To: <004401d6901e$51b4cb70$f51e6250$@ewellic.org> References: <004401d6901e$51b4cb70$f51e6250$@ewellic.org> Message-ID: <145980B6-FBCA-48D9-A59E-5A36160A0885@evertype.com> I?m still proud of the 2005 ?Proposal to add INVERTED INTERROBANG to the UCS?, https://www.unicode.org/L2/L2005/05086-n2935-interrobang.pdf > On 21 Sep 2020, at 14:51, Doug Ewell via Unicode wrote: > > Hans ?berg wrote: > >> In chess commenting, there is difference between "?!", a questionable >> move, and "!?", an interesting move. The interrobang is neutral with >> respect such a distinction (though not used in chess). > > Those would be ? and ? (and ? and ?), not ?, although I imagine chess publications don't use any of those either. > > Interrobang must have one of the highest "talked about" versus "actually used" ratios of any encoded character. > > -- > Doug Ewell, CC, ALB | Thornton, CO, US | ewellic.org > > > From john.w.kennedy at gmail.com Mon Sep 21 10:18:43 2020 From: john.w.kennedy at gmail.com (John W Kennedy) Date: Mon, 21 Sep 2020 11:18:43 -0400 Subject: Is there a Greek version of the INTERROBANG? In-Reply-To: <004401d6901e$51b4cb70$f51e6250$@ewellic.org> References: <004401d6901e$51b4cb70$f51e6250$@ewellic.org> Message-ID: <913E138C-6468-4DFC-B015-0D02EE0CBE08@gmail.com> The interrobang is so fetch! -- John W. Kennedy "Compact is becoming contract, Man only earns and pays." -- Charles Williams. "Bors to Elayne: On the King's Coins" > On Sep 21, 2020, at 9:53 AM, Doug Ewell via Unicode wrote: > > ?Hans ?berg wrote: > >> In chess commenting, there is difference between "?!", a questionable >> move, and "!?", an interesting move. The interrobang is neutral with >> respect such a distinction (though not used in chess). > > Those would be ? and ? (and ? and ?), not ?, although I imagine chess publications don't use any of those either. > > Interrobang must have one of the highest "talked about" versus "actually used" ratios of any encoded character. > > -- > Doug Ewell, CC, ALB | Thornton, CO, US | ewellic.org > > > From haberg-1 at telia.com Mon Sep 21 11:03:20 2020 From: haberg-1 at telia.com (=?utf-8?Q?Hans_=C3=85berg?=) Date: Mon, 21 Sep 2020 18:03:20 +0200 Subject: Is there a Greek version of the INTERROBANG? In-Reply-To: <004401d6901e$51b4cb70$f51e6250$@ewellic.org> References: <004401d6901e$51b4cb70$f51e6250$@ewellic.org> Message-ID: > On 21 Sep 2020, at 15:51, Doug Ewell via Unicode wrote: > > Hans ?berg wrote: > >> In chess commenting, there is difference between "?!", a questionable >> move, and "!?", an interesting move. The interrobang is neutral with >> respect such a distinction (though not used in chess). > > Those would be ? and ? (and ? and ?), not ?, although I imagine chess publications don't use any of those either. Those might be used for better kerning and as the pairs are semantically single symbols, though probably not used. From andrewcwest at gmail.com Mon Sep 21 11:14:41 2020 From: andrewcwest at gmail.com (Andrew West) Date: Mon, 21 Sep 2020 17:14:41 +0100 Subject: Is there a Greek version of the INTERROBANG? In-Reply-To: References: <004401d6901e$51b4cb70$f51e6250$@ewellic.org> Message-ID: On Mon, 21 Sep 2020 at 17:05, Hans ?berg via Unicode wrote: > > > On 21 Sep 2020, at 15:51, Doug Ewell via Unicode wrote: > > > > Those would be ? and ? (and ? and ?), not ?, although I imagine chess publications don't use any of those either. > > Those might be used for better kerning and as the pairs are semantically single symbols, though probably not used. I believe that ? and ? were encoded for Mongolian where they are used in vertical layout (i.e. ? and ! side-by-side in the vertical text stream, not one below the other which is what you would get if you used "?!" [2 characters] instead of "?"). Andrew From jknappen at web.de Mon Sep 21 15:09:38 2020 From: jknappen at web.de (=?UTF-8?Q?J=C3=B6rg_Knappen?=) Date: Mon, 21 Sep 2020 22:09:38 +0200 Subject: Two asterisks aligned vertically (war einmal: Aw: Re: Is there a Greek version of the INTERROBANG?) In-Reply-To: <307d090e-a88e-45f4-9ff9-7c2d1d1a983e@disroot.org> References: <004201d68dc9$7903bf80$6b0b3e80$@ewellic.org> <307d090e-a88e-45f4-9ff9-7c2d1d1a983e@disroot.org> Message-ID: An HTML attachment was scrubbed... URL: From gwidion at gmail.com Mon Sep 21 18:52:48 2020 From: gwidion at gmail.com (Joao S. O. Bueno) Date: Mon, 21 Sep 2020 20:52:48 -0300 Subject: Two asterisks aligned vertically (war einmal: Aw: Re: Is there a Greek version of the INTERROBANG?) In-Reply-To: References: <004201d68dc9$7903bf80$6b0b3e80$@ewellic.org> <307d090e-a88e-45f4-9ff9-7c2d1d1a983e@disroot.org> Message-ID: On Mon, 21 Sep 2020 at 17:11, J?rg Knappen via Unicode wrote: > The two vertically aligned asterisks occur in traditional sequences of > footnote markers from the 19th century and earlier. Since all other > footnote markers are single Unicode characters it is reasonable to treat > this one on equal footing. > > This one? Character(code=0x2051, value='?', name='TWO ASTERISKS ALIGNED VERTICALLY', category='Po', width='N') , --J?rg Knappen > > > *Gesendet:* Freitag, 18. September 2020 um 22:31 Uhr > *Von:* "abrahamgross--- via Unicode" > *An:* "Doug Ewell" > *Cc:* unicode at unicode.org, "'Marius Spix'" > *Betreff:* Re: Is there a Greek version of the INTERROBANG? > Not to bash on "Shady Characters: The Secret Life of Punctuation, Symbols, > and Other Typographical Marks", but it doesn't really have much information > past the what you can find on the wikipedia articles of the punctuation > marks it talks about. also a lot of the information isn't researched > thoroughly enough for a satisfying history of the given punctuation mark. > > Like in the asterisk chapter the author talks about how ? U+2051 TWO > ASTERISKS ALIGNED VERTICALLY is buried in the depths of unicode for some > unknown reason, when it was used in the past for specific reasons (which is > why unicode added it) > 2020/09/18 ??10:40:44 Doug Ewell via Unicode : > > > Marius Spix wrote: > > > >> So you could compose an exclamation mark with a comma instead of a dot > >> with U+0049 LATIN CAPITAL LETTER I and U+0326 COMBINING COMMA BELOW, > >> but it still does not look right > > > > That's because it isn't right. The exclamation mark is not derived from > a capital I. > > > >> and I am not even sure, if there is any evidence for such a character. > > > > > https://en.wikipedia.org/wiki/Punctuation#%22Question_comma%22,_%22exclamation_comma%22 > > > > There are actually books about punctuation. One, which I regrettably do > not yet own, is "Shady Characters: The Secret Life of Punctuation, Symbols, > and Other Typographical Marks," by Keith Houston. Such works might be > useful for answering this question and the earlier one about the origin of > the less-than sign. > > > > -- > > Doug Ewell, CC, ALB | Thornton, CO, US | ewellic.org > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From philip_chastney at yahoo.com Tue Sep 22 03:52:35 2020 From: philip_chastney at yahoo.com (philip chastney) Date: Tue, 22 Sep 2020 08:52:35 +0000 (UTC) Subject: Why is the "<" symbol named the "less-than sign"? In-Reply-To: References: Message-ID: <456789327.4689119.1600764755414@mail.yahoo.com> there was a consultancy -- Accenture, I think -- who put a "greater-than" mark above one of the characters in their name I struck me as odd that they should mark their name with a diminuendo lack of ambition, perhaps??? . . .?? /phil On Wednesday, 16 September 2020, 18:11:59 GMT, James Tauber via Unicode wrote: On Wed, Sep 16, 2020 at 10:17 PM John W Kennedy via Unicode wrote: : was changed to <, e (for ?end?) was changed to /, and . was changed to >. The characters used, of course, had to be available in ASCII. I dare say early publications on SGML include a rationale.? Had to be available in ASCII, less likely to occur in character content, and the < > pair visually suggests the open/close of markup (and alludes to the older?written convention of circling markup to distinguish it from content). James -------------- next part -------------- An HTML attachment was scrubbed... URL: From benson_muite at emailplus.org Wed Sep 23 04:40:20 2020 From: benson_muite at emailplus.org (Benson Muite) Date: Wed, 23 Sep 2020 12:40:20 +0300 Subject: Missing flag emojis? Message-ID: Hi, Are there plans to add Flag emojis for Sint Eustatius (https://en.wikipedia.org/wiki/Flag_of_Sint_Eustatius) and Saba (https://en.wikipedia.org/wiki/Flag_of_Saba). They currently seem to be lumped together with Bonaire which has the code U+1F1E7 U+1F1F6 Regards, Benson From wjgo_10009 at btinternet.com Wed Sep 23 06:52:53 2020 From: wjgo_10009 at btinternet.com (William_J_G Overington) Date: Wed, 23 Sep 2020 12:52:53 +0100 (BST) Subject: L2/20-213 Hand with palm facing up and Hand with palm facing down for Unicode 14.0 In-Reply-To: <1ced1458.7e1.174a2698a5e.Webtop.48@btinternet.com> References: <1ced1458.7e1.174a2698a5e.Webtop.48@btinternet.com> Message-ID: <1a8eb4b1.55e.174bacf2aa6.Webtop.213@btinternet.com> I have yesterday sent formal comment on the proposal to the Unicode Technical Committee. Except for the final sentence and the change of date, the formal comment is the same text as in my original post. The final sentence of the formal comment is as follows. So could there be three new emoji for these hand gestures rather than just the two in the proposal please? William Overington Wednesday 23 September 2020 ------ Original Message ------ From: "wjgo_10009--- via Unicode" To: unicode at unicode.org Sent: Friday, 2020 Sep 18 At 19:10 Subject: L2/20-213 Hand with palm facing up and Hand with palm facing down for Unicode 14.0 I refer to the following document. https://www.unicode.org/L2/L2020/20213-palms-up-down-emoji.pdf Hand with palm facing up and Hand with palm facing down for Unicode 14.0 The meaning of the proposed emoji 'Hand with palm facing down' is fine. My own experience is that this is quite formal. For example, as in the following. ?Good evening ma?am, may I have the pleasure of this dance?? and offers his right hand, palm down, as if in a formal ballroom setting. The meaning of the proposed emoji 'Hand with palm facing up' as "drop, go away, drop it, put down" does not correspond with my own personal experience, though the concept mentioned later in the document of "Palm up can indicate a lack of knowledge" does in the sense of "Who knows!", though I do not understand quite what "Palm up can indicate a lack of knowledge cross-linguistically" means. But my lack of experience of the meanings stated in the document is no reason whatsoever not to encode the proposed meanings for that hand gesture. However, I am thinking that the proposed 'Hand with palm facing up' could be renamed as 'Hand with palm facing up with fingers upward' and a third emoji 'Hand with palm facing up with fingers downward' added. For me, 'Hand with palm facing up with fingers downward' is a common gesture, such as inviting a visitor to home or office (in antepandemicum times and hopefully in the future) to sit down and make himself or herself comfortable, or to indicate "after you" when two lanes of road traffic are merging at road works, or "please proceed" when letting a car from a side road into queued road traffic. The custom being that the other driver raises his or her hand in acknowledgement and thanks. For another example, going into a restaurant (in antepandemicum times and hopefully in the future) early evening when all of the trade at that time of day appears to be take-aways, and asking if one can have a sit down meal at present (as one is going direct from work to an evening institute meeting) and the manager indicating 'yes certainly' in speech and by a palm up fingers downward gesture towards the empty seated area of the restaurant. So should there be three new emoji for these hand gestures rather than just the two in the proposal? William Overington Friday 18 September 2020 -------------- next part -------------- An HTML attachment was scrubbed... URL: From marius.spix at web.de Wed Sep 23 10:44:51 2020 From: marius.spix at web.de (Marius Spix) Date: Wed, 23 Sep 2020 17:44:51 +0200 Subject: Fw: Aw: Missing flag emojis? References: Message-ID: An HTML attachment was scrubbed... URL: From wjgo_10009 at btinternet.com Wed Sep 23 11:13:29 2020 From: wjgo_10009 at btinternet.com (William_J_G Overington) Date: Wed, 23 Sep 2020 17:13:29 +0100 (BST) Subject: Fw: Aw: Missing flag emojis? In-Reply-To: References: Message-ID: <6d3f4db9.843.174bbbdc3ef.Webtop.49@btinternet.com> > ? similar to the flags of England, Scotland and Wales. Frankly, that encoding of the flags of England, Scotland and Wales by Unicode Inc. annoys me. I say encode all five atomically. And others too. There seems space for lots of comic faces but none for our flags. Americans don't dip their flag, Unicode Inc. should not dip national flags. William Overington Wednesday 23 September 2020 ------ Original Message ------ From: "Marius Spix via Unicode" To: unicode at unicode.org Sent: Wednesday, 2020 Sep 23 At 16:44 Subject: Fw: Aw: Missing flag emojis? Gesendet: Mittwoch, 23. September 2020 um 17:44 Uhr Von: "Marius Spix" An: benson_muite at emailplus.org Betreff: Aw: Missing flag emojis? The Netherlands Antilles (ISO 3166-2 code AN) were dissolved in 2010. Aruba, Cura?ao and Sint Maarten became constituent states. But Bonaire, Saba und Sint Eustatius are only special municipalities of the Netherlands, but they became their own ISO 3166-2 code BQ in addition to NL. The codes BQ-BO, BQ-SA and BQ-SE are synonyms of NL-BQ1, NL-BQ2 and NL-BQ3. So I would recommend to encode the subdivision flag sequences NLBQ1, NLBQ2 and NLBQ3 similar to the flags of England, Scotland and Wales. Gesendet: Mittwoch, 23. September 2020 um 11:40 Uhr Von: "Benson Muite via Unicode" An: unicode at unicode.org Betreff: Missing flag emojis? Hi, Are there plans to add Flag emojis for Sint Eustatius (https://en.wikipedia.org/wiki/Flag_of_Sint_Eustatius ) and Saba (https://en.wikipedia.org/wiki/Flag_of_Saba ). They currently seem to be lumped together with Bonaire which has the code U+1F1E7 U+1F1F6 Regards, Benson -------------- next part -------------- An HTML attachment was scrubbed... URL: From silverpie2 at mac.com Wed Sep 23 11:27:43 2020 From: silverpie2 at mac.com (J Andrew Lipscomb) Date: Wed, 23 Sep 2020 12:27:43 -0400 Subject: BES island flags In-Reply-To: References: Message-ID: Flags are based on the ISO 3166 standards, which consider those three islands as a single entity coded as BQ (which the codepoints you mention stand for). There is no one flag that represents the group in question specifically. The designers of most emoji fonts chose the Bonaire flag because it is the most populated of them (Facebook goes the other way and simply shows the Dutch tricolor). Flags for second-level entities are encodable, but generally not implemented except for England, Scotland, and Wales. Envoy? de mon iPad > Are there plans to add Flag emojis for Sint Eustatius and Saba . They currently seem to be lumped together with Bonaire which has the code U+1F1E7 U+1F1F6 From pkar at ieee.org Wed Sep 23 11:34:24 2020 From: pkar at ieee.org (Piotr Karocki) Date: Wed, 23 Sep 2020 18:34:24 +0200 Subject: BES island flags In-Reply-To: References: Message-ID: <12f6d6f861aa1a51379cfb67a5d849ff@mail.gmail.com> By the way, what is procedure of changing Unicode flags as ISO 3166 is modified (changed)? -----Original Message----- From: Unicode [mailto:unicode-bounces at unicode.org] On Behalf Of J Andrew Lipscomb via Unicode Sent: Wednesday, 23 September 2020 18:28 To: unicode at unicode.org Subject: Re: BES island flags Flags are based on the ISO 3166 standards, which consider those three islands as a single entity coded as BQ (which the codepoints you mention stand for). There is no one flag that represents the group in question specifically. The designers of most emoji fonts chose the Bonaire flag because it is the most populated of them (Facebook goes the other way and simply shows the Dutch tricolor). Flags for second-level entities are encodable, but generally not implemented except for England, Scotland, and Wales. Envoy? de mon iPad > Are there plans to add Flag emojis for Sint Eustatius and Saba . They > currently seem to be lumped together with Bonaire which has the code > U+1F1E7 U+1F1F6 From doug at ewellic.org Wed Sep 23 11:34:33 2020 From: doug at ewellic.org (Doug Ewell) Date: Wed, 23 Sep 2020 10:34:33 -0600 Subject: Aw: Missing flag emojis? In-Reply-To: References: Message-ID: <000201d691c7$6b772c60$42658520$@ewellic.org> Marius Spix wrote: > The Netherlands Antilles (ISO 3166-2 code AN) were dissolved in 2010. > Aruba, Cura?ao and Sint Maarten became constituent states. But > Bonaire, Saba und Sint Eustatius are only special municipalities of > the Netherlands, but they became their own ISO 3166-2 code BQ in > addition to NL. The codes BQ-BO, BQ-SA and BQ-SE are synonyms of > NL-BQ1, NL-BQ2 and NL-BQ3. So I would recommend to encode the > subdivision flag sequences NLBQ1, NLBQ2 > and NLBQ3 similar to the flags of England, Scotland and > Wales. The tag letters should actually be lowercase: ?bqbo? for Bonaire ?bqsa? for Saba ?bqse? for Sint Eustatius where ? is U+1F3F4 WAVING BLACK FLAG (tag_base), ? represents U+E007F TAG CANCEL (tag_end), and the letters in between are Plane 14 tag characters, as described in UTS #51. These sequences are not RGI (recommended for general interchange), which does not necessarily mean they are "not recommended," but does mean you might have trouble finding implementations that recognize them. As "Netherlands Antilles" is no longer an entity, the flag of Caribbean Netherlands (Bonaire, Sint Eustatius and Saba) is actually the same as the flag of the Netherlands proper. That means the RIS characters "BQ" should actually display that flag, not the flag of Bonaire as Benson Muite stated. But there are no absolute guarantees as to how platforms will render flag emoji. -- Doug Ewell, CC, ALB | Thornton, CO, US | ewellic.org From markus.icu at gmail.com Wed Sep 23 11:48:31 2020 From: markus.icu at gmail.com (Markus Scherer) Date: Wed, 23 Sep 2020 09:48:31 -0700 Subject: BES island flags In-Reply-To: <12f6d6f861aa1a51379cfb67a5d849ff@mail.gmail.com> References: <12f6d6f861aa1a51379cfb67a5d849ff@mail.gmail.com> Message-ID: On Wed, Sep 23, 2020 at 9:40 AM Piotr Karocki via Unicode < unicode at unicode.org> wrote: > By the way, > what is procedure of changing Unicode flags as ISO 3166 is modified > (changed)? > As the ISO changes are integrated into CLDR, the validity for emoji sequences follows. https://www.unicode.org/reports/tr51/#Flags https://www.unicode.org/reports/tr51/#flag-emoji-tag-sequences Unicode only defines valid and recommended sequences. It does not prescribe how exactly glyphs for those should look. markus -------------- next part -------------- An HTML attachment was scrubbed... URL: From marius.spix at web.de Wed Sep 23 11:51:36 2020 From: marius.spix at web.de (Marius Spix) Date: Wed, 23 Sep 2020 18:51:36 +0200 Subject: Aw: RE: Missing flag emojis? In-Reply-To: <000201d691c7$6b772c60$42658520$@ewellic.org> References: <000201d691c7$6b772c60$42658520$@ewellic.org> Message-ID: An HTML attachment was scrubbed... URL: From doug at ewellic.org Wed Sep 23 12:10:42 2020 From: doug at ewellic.org (Doug Ewell) Date: Wed, 23 Sep 2020 11:10:42 -0600 Subject: Missing flag emojis? In-Reply-To: References: <000201d691c7$6b772c60$42658520$@ewellic.org> Message-ID: <000001d691cc$7884e6e0$698eb4a0$@ewellic.org> Marius Spix wrote: > I would rather prefer > > ?nlbq1? for Bonaire > ?nlbq2? for Saba > ?nlbq3? for Sint Eustatius > > because the entity ?Bonaire, Sint Eustatius and Saba? is no state. "Heard Island and McDonald Islands" is not a state either (these remote, uninhabited islands are administered by Australia), but there is an ISO 3166-1 code element, which is the criterion for these sequences. In principle, the "bq" and "nlbq" sequences should both render the appropriate flags. > Also, according to IANA the TLD .bq is not used (they use .nl > instead). Country-code top-level domains are also not the criterion for these sequences. The Soviet Union has not been a state for almost 29 years, but there are still more than 100,000 domains under ".su" (Wikipedia). Meanwhile, ".uk" has completely swamped ".gb" as the preferred TLD for the United Kingdom, although "UK" has never been an assigned ISO 3166 code element. -- Doug Ewell, CC, ALB | Thornton, CO, US | ewellic.org From harjitmoe at outlook.com Wed Sep 23 12:21:35 2020 From: harjitmoe at outlook.com (Harriet Riddle) Date: Wed, 23 Sep 2020 17:21:35 +0000 Subject: Missing flag emojis? In-Reply-To: References: <000201d691c7$6b772c60$42658520$@ewellic.org>, Message-ID: From: Unicode on behalf of Marius Spix via Unicode Cc: unicode at unicode.org Subject: Aw: RE: Missing flag emojis? [?] Also, according to IANA the TLD .bq is not used (they use .nl instead). What TLDs use is neither here nor there: the United Kingdom uses .uk while its ISO code is GB?which refers to the entire UK, not just Great Britain?with UK being exceptionally reserved for the United Kingdom but not the assigned code.?(For reference, the code for Great Britain specifically (say, if an implementation wants to add support for the version of the Union Jack without the Patrick's Cross) is GB-GBN, i.e. tags ?gbgbn? in a regional flag sequence.) Gesendet: Mittwoch, 23. September 2020 um 18:34 Uhr Von: "Doug Ewell" An: unicode at unicode.org Cc: "'Marius Spix'" Betreff: RE: Aw: Missing flag emojis? [?] As "Netherlands Antilles" is no longer an entity, the flag of Caribbean Netherlands (Bonaire, Sint Eustatius and Saba) is actually the same as the flag of the Netherlands proper. That means the RIS characters "BQ" should actually display that flag, not the flag of Bonaire as Benson Muite stated. But there are no absolute guarantees as to how platforms will render flag emoji. I've never quite understood why the RIS sequence UM is broadly implemented, when there is not a flag specifically for the US Minor Islands (nor likely to be any push for one, given the lack of permanent residents), and the US flag already can be represented as US.?It just strikes me as an unnecessary duplicate representation.?But I presume this can be chalked up to the vendors trying to implement every country code (as opposed to the tiny selection of countries whose flags have JCarrier sources). Although if a region without its own flag can display a flag of a country, state, province or autonomous region which it is part of, then that would provide a possible solution to those clamouring for a Kurdish flag (if vendors show it for ?iqar?). ?Har. -------------- next part -------------- An HTML attachment was scrubbed... URL: From christoph.paeper at crissov.de Thu Sep 24 05:56:04 2020 From: christoph.paeper at crissov.de (=?utf-8?Q?Christoph_P=C3=A4per?=) Date: Thu, 24 Sep 2020 12:56:04 +0200 Subject: BES island flags In-Reply-To: References: Message-ID: <9861ACC9-5D28-4ADD-BD9A-3B0B23446EAC@crissov.de> 23.09.2020 um 18:50 schrieb Markus Scherer > > Unicode only defines valid and recommended sequences. It does not prescribe how exactly glyphs for those should look. Still, the lack of a commonly agreed upon design was the reason for not RGIing the flag emoji for GB-NIR alongside GB-ENG, GB-WLS and GB-SCT. From christoph.paeper at crissov.de Fri Sep 25 04:08:58 2020 From: christoph.paeper at crissov.de (=?utf-8?Q?Christoph_P=C3=A4per?=) Date: Fri, 25 Sep 2020 11:08:58 +0200 Subject: BES island flags In-Reply-To: <4911E581-E22F-471D-B012-73B22D8B2863@gmail.com> References: <4911E581-E22F-471D-B012-73B22D8B2863@gmail.com> Message-ID: <68CE3C51-AA12-4117-B113-B9610F41298A@crissov.de> Jonathan Coxhead : > > ? Not the fact that GB-NIR is a contradiction? ?GB? is just an arbitrary code element in ISO 3166-1, which resembles a popular abbreviation purely for human convenience. It could have been ?UK?, ?EN? or ?QX? just as well. From richard.wordingham at ntlworld.com Fri Sep 25 06:02:32 2020 From: richard.wordingham at ntlworld.com (Richard Wordingham) Date: Fri, 25 Sep 2020 12:02:32 +0100 Subject: BES island flags In-Reply-To: <68CE3C51-AA12-4117-B113-B9610F41298A@crissov.de> References: <4911E581-E22F-471D-B012-73B22D8B2863@gmail.com> <68CE3C51-AA12-4117-B113-B9610F41298A@crissov.de> Message-ID: <20200925120232.60e69389@JRWUBU2> On Fri, 25 Sep 2020 11:08:58 +0200 Christoph P?per via Unicode wrote: > Jonathan Coxhead : > > > > ? Not the fact that GB-NIR is a contradiction? > > ?GB? is just an arbitrary code element in ISO 3166-1, which resembles > a popular abbreviation purely for human convenience. It could have > been ?UK?, ?EN? or ?QX? just as well. 'EN' would have been incendiary. 'GB' can at least be explained as an abbreviation of 'GBNI'. However, the common use of 'UK' instead is probably due to 'GB' being felt as wrong. Richard. From andrewcwest at gmail.com Fri Sep 25 06:46:40 2020 From: andrewcwest at gmail.com (Andrew West) Date: Fri, 25 Sep 2020 12:46:40 +0100 Subject: BES island flags In-Reply-To: <20200925120232.60e69389@JRWUBU2> References: <4911E581-E22F-471D-B012-73B22D8B2863@gmail.com> <68CE3C51-AA12-4117-B113-B9610F41298A@crissov.de> <20200925120232.60e69389@JRWUBU2> Message-ID: On Fri, 25 Sep 2020 at 12:08, Richard Wordingham via Unicode wrote: > > 'EN' would have been incendiary. 'GB' can at least be explained as > an abbreviation of 'GBNI'. However, the common use of 'UK' instead is > probably due to 'GB' being felt as wrong. 'UK' and 'GB' will both be wrong when Scotland becomes independent, as seems increasingly likely. Andrew From christoph.paeper at crissov.de Fri Sep 25 08:51:29 2020 From: christoph.paeper at crissov.de (=?utf-8?Q?Christoph_P=C3=A4per?=) Date: Fri, 25 Sep 2020 15:51:29 +0200 Subject: BES island flags In-Reply-To: References: Message-ID: <355EE1B2-ABD2-45F1-B669-9CE57CFC8CA4@crissov.de> Andrew West: > > 'UK' and 'GB' will both be wrong when Scotland becomes independent, as > seems increasingly likely. Which is just another fact to prove that introducing emoji flags based upon ISO 3166-2 codes (i.e. thousands) was premature. Using user-defined ISO 3166-1 codes would have sufficed for the currently three (or four) subordinate entities that were actually needed. From sdowney at gmail.com Fri Sep 25 12:52:51 2020 From: sdowney at gmail.com (Steve Downey) Date: Fri, 25 Sep 2020 13:52:51 -0400 Subject: C++ is moving forward on adopting UAX 31 - UNICODE IDENTIFIER AND PATTERN SYNTAX Message-ID: Eliding many intra-comittee process details, the C++ working group responsible for language evolution voted a paper as ready to forward to the group responsible for standardese a paper adopting UAX 31 rules for the syntax for identifiers, using XID_Start + LOW LINE and XID_Continue. This replaces the white list of allowed code points from last millennium, and fixes some mistakes like allowing RTL modifiers in identifiers. Current draft of the paper is available at http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2020/p1949r6.html I wanted to thank the Unicode committee for UAX 31 so that we didn't have to reinvent the wheel. -------------- next part -------------- An HTML attachment was scrubbed... URL: From doug at ewellic.org Fri Sep 25 14:29:12 2020 From: doug at ewellic.org (Doug Ewell) Date: Fri, 25 Sep 2020 13:29:12 -0600 Subject: CLDR, emoji flags, and ISO 3166 (was: Re: BES island flags) Message-ID: <000101d69372$265d1e80$73175b80$@ewellic.org> Christoph P?per replied to Andrew West: >> 'UK' and 'GB' will both be wrong when Scotland becomes independent, >> as seems increasingly likely. > > Which is just another fact to prove that introducing emoji flags based > upon ISO 3166-2 codes (i.e. thousands) was premature. Using user- > defined ISO 3166-1 codes would have sufficed for the currently three > (or four) subordinate entities that were actually needed. I don't think I see the connection. Andrew's point was that, IF one considers the ISO 3166-1 code elements to be mnemonic abbreviations, neither "United Kingdom" nor "Great Britain" would furnish an appropriate mnemonic if and when Scotland leaves that entity. Of course, ISO 3166-1 code elements are never guaranteed to be mnemonic abbreviations. The argument about 'GB' not being inclusive of Northern Ireland is an old and tired one. As Harriet Riddle noted earlier, the ISO 3166-1 code element 'GB' represents the United Kingdom of Great Britain and Northern Ireland. If you like, the 'G' and 'B' can be seen as "standing for" two of the letters in that name, but the code element represents the entire country. That is all there is to it. Using ISO 3166-2 code elements to represent flags of entities smaller than an ISO 3166-1 country (such as Colorado, Baden-W?rttemberg, or England) has nothing to do with this. To the rest of Christoph's response: there are real costs to exposing user-defined code elements in a public standard (I wish CLDR had not put its foot in this doorway with 'XK'), and I question that the only use for subdivision coding in CLDR or in emoji flags was to distinguish England, Scotland, and Wales. -- Doug Ewell, CC, ALB | Thornton, CO, US | ewellic.org From everson at evertype.com Fri Sep 25 18:07:29 2020 From: everson at evertype.com (Michael Everson) Date: Sat, 26 Sep 2020 00:07:29 +0100 Subject: BES island flags In-Reply-To: <9861ACC9-5D28-4ADD-BD9A-3B0B23446EAC@crissov.de> References: <9861ACC9-5D28-4ADD-BD9A-3B0B23446EAC@crissov.de> Message-ID: <90A9A0A9-5665-44BD-A0B1-135DD132B984@evertype.com> But it?s nonsense. There is no ?official? flag for Northern Ireland. Unionists like to use the Union flag ??. Nationalists like to use the Irish flag ??. But the UK remains a State made up of four Nations, England ???????, Scotland ???????, Wales ???????, and Northern Ireland OOPS. And there is a flag used ? by everyone ? for Northern Ireland in terms of sport and other things. Nowadays the distinction is extremely important in maps and charts, for instance, of COVID-19 incidences and responses. But Unicode failed to add the Northern Ireland flag, even though it exists, regardless of its ?official? status. (They didn?t ask the Irish or UK representatives to SC2 for an opinion, either. They just decided ?No flag for GB-NIR?.) The decision made was not practical, not respectful, and not correct, in my view. Michael Everson > On 24 Sep 2020, at 11:56, Christoph P?per via Unicode wrote: > > 23.09.2020 um 18:50 schrieb Markus Scherer >> >> Unicode only defines valid and recommended sequences. It does not prescribe how exactly glyphs for those should look. > > Still, the lack of a commonly agreed upon design was the reason for not RGIing the flag emoji for GB-NIR alongside GB-ENG, GB-WLS and GB-SCT. > From doug at ewellic.org Fri Sep 25 18:27:33 2020 From: doug at ewellic.org (Doug Ewell) Date: Fri, 25 Sep 2020 17:27:33 -0600 Subject: BES island flags In-Reply-To: <90A9A0A9-5665-44BD-A0B1-135DD132B984@evertype.com> References: <9861ACC9-5D28-4ADD-BD9A-3B0B23446EAC@crissov.de> <90A9A0A9-5665-44BD-A0B1-135DD132B984@evertype.com> Message-ID: <000401d69393$729d5f00$57d81d00$@ewellic.org> Michael Everson wrote: > But Unicode failed to add the Northern Ireland flag, even though it > exists, regardless of its ?official? status. (They didn?t ask the > Irish or UK representatives to SC2 for an opinion, either. They just > decided ?No flag for GB-NIR?.) Well, sort of. They decided, ?We?ll make the other three ?Recommended for General Interchange,? but not GB-NIR.? So a significant number of font vendors think ??????? is ?not recommended,? and although you can insert ??????? into your text all you want, and if you?re lucky you might even see a Northern Ireland flag of one sort or another (say, if you use Andrew West?s BabelStone Flags font), odds are you won?t. -- Doug Ewell, CC, ALB | Thornton, CO, US | ewellic.org From christoph.paeper at crissov.de Sat Sep 26 05:51:52 2020 From: christoph.paeper at crissov.de (=?utf-8?Q?Christoph_P=C3=A4per?=) Date: Sat, 26 Sep 2020 12:51:52 +0200 Subject: BES island flags In-Reply-To: <000401d69393$729d5f00$57d81d00$@ewellic.org> References: <000401d69393$729d5f00$57d81d00$@ewellic.org> Message-ID: Doug Ewell: > > They decided, ?We?ll make the other three ?Recommended for General Interchange,? but not GB-NIR.? Unicode first needed to define a completely new method involving otherwise unused Tag characters to encode ISO 3166-2 codes as emoji flags. This enabled _thousands_ of new valid sequences, not all of which have a definitive flag design associated. Of these, only _three_ were RGIed then. (Few years before, a dozen of preexisting emoji flags lead to the systematic encoding of about two hundred geographic codes, some of which have no unique flag associated with them.) The only preexisting implementation (WhatsApp) used a different method (RIS ?XE? etc.) which was absolutely sufficient for handling just a handful of flags. UTC created a kind of Mexican standoff thereafter: they expect vendors to show implementation interest in certain flag emojis before formally recommending any further ones, while vendors are waiting for the consortium to recommend certain codes for implementation before taking action. From harjitmoe at outlook.com Sat Sep 26 06:20:15 2020 From: harjitmoe at outlook.com (Harriet Riddle) Date: Sat, 26 Sep 2020 11:20:15 +0000 Subject: BES island flags In-Reply-To: References: <000401d69393$729d5f00$57d81d00$@ewellic.org>, Message-ID: From: Unicode on behalf of Christoph P?per via Unicode Sent: 26 September 2020 11:51 To: unicode at unicode.org Subject: Re: BES island flags Unicode first needed to define a completely new method involving otherwise unused Tag characters to encode ISO 3166-2 codes as emoji flags. This enabled _thousands_ of new valid sequences, not all of which have a definitive flag design associated. Of these, only _three_ were RGIed then. (Few years before, a dozen of preexisting emoji flags lead to the systematic encoding of about two hundred geographic codes, some of which have no unique flag associated with them.) The only preexisting implementation (WhatsApp) used a different method (RIS ?XE? etc.) which was absolutely sufficient for handling just a handful of flags. ?and in fairness, we did still end up with, as mentioned already, XK for Kosovo, although that is in some respects a different situation (Kosovo's ISO 3166 code is RS-KM, which is an entire can of worms in itself, though to the best of my knowledge the flag in question isn't used/recognised by the Republic of Serbia, and the code XK apparently gets used for Kosovo in several non-emoji contexts as well, which is not to the best of my knowledge true for XE and England). For comparison, Taiwan gets both CN-TW and TW in ISO 3166 (although the ISO finish up listing its name as "Taiwan (Province of China)"), and (for further comparison) Hong Kong gets both CN-HK and HK. ?Har. -------------- next part -------------- An HTML attachment was scrubbed... URL: From christoph.paeper at crissov.de Sat Sep 26 06:32:57 2020 From: christoph.paeper at crissov.de (=?utf-8?Q?Christoph_P=C3=A4per?=) Date: Sat, 26 Sep 2020 13:32:57 +0200 Subject: CLDR, emoji flags, and ISO 3166 (was: Re: BES island flags) In-Reply-To: <000101d69372$265d1e80$73175b80$@ewellic.org> References: <000101d69372$265d1e80$73175b80$@ewellic.org> Message-ID: <1E1F7296-78F1-457E-B560-D917F714675D@crissov.de> Doug Ewell: > there are real costs to exposing user-defined code elements in a public standard (?), If one public standard references another public standard, it?s perfectly fine for the former to make use of the extension mechanisms provided by the latter. The original intent of ISO 3166 certainly was not flags, but there?s also no better standard available. > I question that the only use for subdivision coding in CLDR or in emoji flags was to distinguish England, Scotland, and Wales. Such as ?? From andrewcwest at gmail.com Sat Sep 26 06:36:56 2020 From: andrewcwest at gmail.com (Andrew West) Date: Sat, 26 Sep 2020 12:36:56 +0100 Subject: BES island flags In-Reply-To: References: <000401d69393$729d5f00$57d81d00$@ewellic.org> Message-ID: On Sat, 26 Sep 2020 at 12:26, Harriet Riddle via Unicode wrote: > > For comparison, Taiwan gets both CN-TW and TW in ISO 3166 (although the ISO finish up listing its name as "Taiwan (Province of China)"), and (for further comparison) Hong Kong gets both CN-HK and HK. TW and CN-TW are not equivalent. TW corresponds to the flag of the Republic of China, whereas CN-TW should correspond to a hypothetical flag for the Province of Taiwan. Andrew From andrewcwest at gmail.com Sat Sep 26 06:54:06 2020 From: andrewcwest at gmail.com (Andrew West) Date: Sat, 26 Sep 2020 12:54:06 +0100 Subject: BES island flags In-Reply-To: References: <000401d69393$729d5f00$57d81d00$@ewellic.org> Message-ID: On Sat, 26 Sep 2020 at 11:57, Christoph P?per via Unicode wrote: > > Unicode first needed to define a completely new method involving otherwise unused Tag characters to encode ISO 3166-2 codes as emoji flags. This enabled _thousands_ of new valid sequences, not all of which have a definitive flag design associated. I believe that most ISO 3166-2 entities are not associated with a definitive flag. And conversely, some geographical entities with established flags do not have an ISO 3166-2 code. For example, some historic counties of England, Wales and Scotland which have a definitive flag do not correspond to a single administrative district with an ISO 3166-2:GB code (e.g. Middlesex and Sussex in England; Caithness and Kirkcudbrightshire in Scotland; and Caernarfonshire and Merionethshire in Wales). So if you wanted to create a font with the flags for all the historic counties in the UK, it would not be possible to do so using the defined flag tag sequences. Andrew From doug at ewellic.org Sat Sep 26 14:59:41 2020 From: doug at ewellic.org (Doug Ewell) Date: Sat, 26 Sep 2020 13:59:41 -0600 Subject: BES island flags In-Reply-To: References: <000401d69393$729d5f00$57d81d00$@ewellic.org> Message-ID: <000f01d6943f$9315fab0$b941f010$@ewellic.org> Christoph P?per wrote: > Unicode first needed to define a completely new method involving > otherwise unused Tag characters to encode ISO 3166-2 codes as emoji > flags. This enabled _thousands_ of new valid sequences, not all of > which have a definitive flag design associated. Of these, only _three_ > were RGIed then. I don't consider it as a problem that not every country subdivision has its own flag. If someone tries to encode, say, the flag of one of the provinces (wilayas) of Algeria, which AFAICT do not have their own flags, they will get a fallback representation, which is pretty much what they asked for. Most UIs do not offer a non-existent flag as an option anyway. > (Few years before, a dozen of preexisting emoji flags lead to the > systematic encoding of about two hundred geographic codes, some of > which have no unique flag associated with them.) Again, I don't see this as a breaking problem. Most of the alternative solutions have their own problems; for example, if a system excludes the flag of Svalbard and Jan Mayen because it is the same as the flag of Norway, then the system (not just the font) has to be revised if a separate flag is introduced later. I would have considered it very much a problem, and said so at the time, if Unicode had chosen to encode only the ten flags (CN DE ES FR GB IT JP KR RU US) that Japanese cell phone vendors had felt it sufficient to implement. > The only preexisting implementation (WhatsApp) used a different method (RIS > ?XE? etc.) which was absolutely sufficient for handling just a handful of > flags. But doesn't scale beyond just a handful. What if I want to represent the highly distinctive and popular flag of Bavaria (DE-BY)? What about Qu?bec or Tokyo? > UTC created a kind of Mexican standoff thereafter: they expect vendors > to show implementation interest in certain flag emojis before formally > recommending any further ones, while vendors are waiting for the > consortium to recommend certain codes for implementation before taking > action. I do agree with this concern about RGI. The usual response is that the member companies of Unicode who determine what should be recommended are the ones who would be implementing them. But of course, not all major vendors are members of Unicode, and in any case there seems to be reluctance to touch this list regardless of evidence of usage. >> I question that the only use for subdivision coding in CLDR or in emoji flags was to distinguish England, Scotland, and Wales. > > Such as ?? CLDR provides an extension (U) to identify language or other locale information by, among other criteria, country subdivision. For example, Canadian French (fr-CA) can be sub-identified as either Qu?bec French (fr-CA-u-sd-caqc) or Acadian French (fr-CA-u-sd-canb). As for emoji flags, as stated above, other country subdivisions besides England, Scotland, and Wales are of local or global interest. People unfamiliar with CLDR extension U complain frequently on social media platforms that they need an emoji flag for their state, province, region, department, or oblast. -- Doug Ewell, CC, ALB | Thornton, CO, US | ewellic.org From kittens at wobble.ninja Wed Sep 30 01:35:32 2020 From: kittens at wobble.ninja (Ellie) Date: Wed, 30 Sep 2020 08:35:32 +0200 Subject: Please fix the trademark policy in regards to code Message-ID: <7842c80a-0b8f-5c77-f37d-a475ace078b9@wobble.ninja> Hi everyone, if I am reading the trademark policy correctly I might be required to rename my "unicode.c" source code file to "Unicode? implementation.c" or some similar ugliness (in my humble opinion) to satisfy the "Trademark Usage Policy", because it seems any sort of exception for source code was left out. Not only does this not fit well with how I see many people name their code files, but also special symbols can cause issues in archives/tarballs when sharing the code. Furthermore, it seems like I would need to add the ? into my variable names as well, even if the language/compiler in question doesn't even support unicode characters, and uppercase the U even if that doesn't fit with any of the coding style. This seems counter-productive to me. Do libicu and such even do this? Or any other programming project implementing unicode, really? Therefore, I kindly ask that this is fixed some time soon, in the document found here: https://www.unicode.org/policies/logo_policy.html The exception should include naming of any source code files of any program that may want to deal with unicode specifications in any way, as well as any reference to unicode inside any such source code file. (Unless you want me to make up silly joke names to refer to unicode in my code instead, I guess I might consider that.) I apologize if this is already covered somewhere in the policy, but I really couldn't see it anywhere. Regards, Ellie