From richard.wordingham at ntlworld.com Tue May 5 08:23:46 2020 From: richard.wordingham at ntlworld.com (Richard Wordingham) Date: Tue, 5 May 2020 14:23:46 +0100 Subject: Is Devanagari =?UTF-8?B?4KSy4KWN4KSy4KS+4KSB?= ambiguous? Message-ID: <20200505142346.2c414ceb@JRWUBU2> Is this Devanagari akshara ambiguous between "l?l?" (with a nasalised first consonant, as in Sanskrit) and "ll?? " (with a nasalised vowel, as in Hindi)? If I understand correctly, the ISO 5919 transliterates the first reading as "m?ll?", or "m?l l?" if one is splitting words combined by sandhi. Richard. From richard.wordingham at ntlworld.com Tue May 5 15:59:20 2020 From: richard.wordingham at ntlworld.com (Richard Wordingham) Date: Tue, 5 May 2020 21:59:20 +0100 Subject: Is Devanagari =?UTF-8?B?4KSy4KWN4KSy4KS+4KSB?= ambiguous? Message-ID: <20200505215920.6221496b@JRWUBU2> (Sorry if this is a duplicate - I'm experimenting with domain names as the Unicode list via unicode at unicode.org is taking over 5 hours to respond and may in fact be out of action.) Is this Devanagari akshara ambiguous between "l?l?" (with a nasalised first consonant, as in Sanskrit) and "ll?? " (with a nasalised vowel, as in Hindi)? If I understand correctly, the ISO 5919 transliterates the first reading as "m?ll?", or "m?l l?" if one is splitting words combined by sandhi. Richard. From richard.wordingham at ntlworld.com Tue May 5 19:19:27 2020 From: richard.wordingham at ntlworld.com (Richard Wordingham) Date: Wed, 6 May 2020 01:19:27 +0100 Subject: [EXTERNAL] Is Devanagari =?UTF-8?B?4KSy4KWN4KSy4KS+4KSB?= ambiguous? In-Reply-To: References: <20200505142346.2c414ceb@JRWUBU2> Message-ID: <20200506011927.1dbf0453@JRWUBU2> On Tue, 5 May 2020 22:57:14 +0000 Andrew Glass wrote: > Here is an excerpt from Whitney's Sanskrit Grammar page 69: > > > > [A close up of a newspaper Description automatically generated] > > Whitney, William Dwight. 1889. A Sanskrit grammar, including both the > classical language, and the older dialects, of Veda and Brahmana. > Bibliothek indogermanischer Grammatiken, Band II. Leipzig: Breitkopf > and H?rtel. And MacDonnell's 1886 revision of Max Mueller's 'A Sanskrit Grammar for Beginners' gives on p21 an example of y?y? with the candrabindu on the far left. (The conjunct uses a half form.) Unfortunately, they don't actually answer the question of whether the placement of candrabindu is significant, though they support my feeling that it is. > The version with explicit virama is nice because it shows how the > ambiguity can be avoided and gives us a clue to the better encoding. > So I would encode these as follows: > As a single cluster with candrabindu applied to the first l: > > 0932 094D 0901 0932 094B > > ????? > > This cluster is supported in Nirmala UI: [A drawing of a face > Description automatically generated] According to https://docs.microsoft.com/en-us/typography/script-development/devanagari , that's two syllables, and that's how HarfBuzz is currently rendering it. It seems I'll have to raise a bug report against HarfBuzz - unless it's changed fairly recently. If I treat the candrabindu as a consonant modifier (i.e. as a type of nukta), which is what the grammarians say it is, and encoded it before the virama, I get a dotted circle out of HarfBuzz. > With explicit virama and candrabindu applied to the first l: > > 0932 094D 0901 0020 0932 094B > > ??? ?? > > Which leaves the vowel marked form as you have given it: > > 0932 094D 0932 093E 0901 > > ????? And this last one is the only encoding allowed by TUS 13 Section 12.1 R10. My Unicode feedback and HarfBuzz bug report should make reference to the thread 'Sanskrit nasalised L' including https://www.unicode.org/mail-arch/unicode-ml/y2011-m06/0144.html . Thank you for your help. One thing I have established is that there are rendering systems that support candrabindu within the consonant stack - but not where I expected it! (This relates to the issue of where U+0D81 SINHALA SIGN CANDRABINDU appears in the encoding of a word, which I've raised elsewhere.) Richard. From richard.wordingham at ntlworld.com Wed May 6 04:46:29 2020 From: richard.wordingham at ntlworld.com (Richard Wordingham) Date: Wed, 6 May 2020 10:46:29 +0100 Subject: Is Devanagari =?UTF-8?B?4KSy4KWN4KSy4KS+4KSB?= ambiguous? In-Reply-To: References: <20200505142346.2c414ceb@JRWUBU2> Message-ID: <20200506104629.06c8ee3d@JRWUBU2> On Tue, 5 May 2020 22:57:14 +0000 Andrew Glass wrote: > As a single cluster with candrabindu applied to the first l: > > 0932 094D 0901 0932 094B > > ????? > > This cluster is supported in Nirmala UI: [A drawing of a face > Description automatically generated] Just to check, is the formation of the conjunct done within the cluster shaping or after the dissolution of the cluster boundaries? This font seems to have been carefully designed so that a candrabindu within the consonant stack forces half-forms, an approach which prevents the vertical stacking seen in the example from Whitney, but prevents candrabindu on a consonant being rendered the same as candrabindu on a vowel. (I tested the behaviour with the consonant stacks t.ra and t.ta, where Sanskrit doesn't permit candrabindu on the first character.) Richard. From cibucj at gmail.com Wed May 6 04:53:48 2020 From: cibucj at gmail.com (Cibu) Date: Wed, 6 May 2020 10:53:48 +0100 Subject: =?UTF-8?B?UmU6IElzIERldmFuYWdhcmkg4KSy4KWN4KSy4KS+4KSBIGFtYmlndW91cz8=?= In-Reply-To: <20200505142346.2c414ceb@JRWUBU2> References: <20200505142346.2c414ceb@JRWUBU2> Message-ID: I thought one would transliterate this as 'll?m?'. That is, the candrabindu occurring as the last. On Wed, May 6, 2020 at 9:19 AM Richard Wordingham via Unicode < unicode at unicode.org> wrote: > Is this Devanagari akshara ambiguous between "l?l?" (with a nasalised > first consonant, as in Sanskrit) and "ll?? " (with a nasalised vowel, as > in Hindi)? If I understand correctly, the ISO 5919 transliterates the > first reading as "m?ll?", or "m?l l?" if one is splitting words > combined by sandhi. > > Richard. > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From richard.wordingham at ntlworld.com Wed May 6 06:58:21 2020 From: richard.wordingham at ntlworld.com (Richard Wordingham) Date: Wed, 6 May 2020 12:58:21 +0100 Subject: Is Devanagari =?UTF-8?B?4KSy4KWN4KSy4KS+4KSB?= ambiguous? In-Reply-To: References: <20200505142346.2c414ceb@JRWUBU2> Message-ID: <20200506125821.4dc05785@JRWUBU2> On Wed, 6 May 2020 10:53:48 +0100 Cibu wrote: > I thought one would transliterate this as 'll?m?'. That is, the > candrabindu occurring as the last. My question is whether the visual placement of the candrabindu affects the meaning. The Unicode standard says (Section 12.1 R10) that it should be the last character in an akshara with these components. Your answer confirms that TUS is wrong when the candrabindu is modifying a consonant; the position matters. Thank you for the information. Microsoft still implements a solution* for Devanagari that uses assigned characters with their correct individual significations. Richard. *Nirmala UI has a typographical problem with this solution for 'l?li' (?????); possibly this is the price of disambiguating it from 'llim?' (?????). From everson at evertype.com Wed May 6 08:01:26 2020 From: everson at evertype.com (Michael Everson) Date: Wed, 6 May 2020 14:01:26 +0100 Subject: =?utf-8?B?UmU6IElzIERldmFuYWdhcmkg4KSy4KWN4KSy4KS+4KSBIGFtYmln?= =?utf-8?B?dW91cz8=?= In-Reply-To: <20200505215920.6221496b@JRWUBU2> References: <20200505215920.6221496b@JRWUBU2> Message-ID: It is not ambiguous in encoding. Whether one interprets it as l?l? or ll?? is a reading rule. But the encoding is LA + VIRAMA + LA + -AA + CANDRABINDU either way. > Is this Devanagari akshara ambiguous between "l?l?" (with a nasalised > first consonant, as in Sanskrit) and "ll?? " (with a nasalised vowel, as > in Hindi)? If I understand correctly, the ISO 5919 transliterates the > first reading as "m?ll?", or "m?l l?" if one is splitting words > combined by sandhi. > > Richard. > From samjnaa at gmail.com Thu May 7 10:31:31 2020 From: samjnaa at gmail.com (Shriramana Sharma) Date: Thu, 7 May 2020 21:01:31 +0530 Subject: =?UTF-8?B?UmU6IFtFWFRFUk5BTF0gSXMgRGV2YW5hZ2FyaSDgpLLgpY3gpLLgpL7gpIEgYW1iaWd1bw==?= =?UTF-8?B?dXM/?= In-Reply-To: <20200506011927.1dbf0453@JRWUBU2> References: <20200505142346.2c414ceb@JRWUBU2> <20200506011927.1dbf0453@JRWUBU2> Message-ID: The only linguistically valid Sanskrit sequences are nasal-Y/V/L followed by the same in non-nasal form. This may then be followed by a vowel or another consonant. In N Indic scripts the first nasal consonant is written as a half form carrying a chandrabindu. The rest is as usual. In S Indic scripts the stack of the consonants carries a chandrabindu on top. See L2/09-372 p 41. I would expect to encode this linguistic sequence in either type of script as: Y/V/L + VIRAMA + CANDRABINDU + ? This is what I have said in my Grantha proposal but that probably got lost among so many other issues. I had been meaning to submit a separate doc on this but haven't been able to get around to it sadly. -------------- next part -------------- An HTML attachment was scrubbed... URL: From Andrew.Glass at microsoft.com Thu May 7 17:38:22 2020 From: Andrew.Glass at microsoft.com (Andrew Glass) Date: Thu, 7 May 2020 22:38:22 +0000 Subject: =?utf-8?B?UkU6IFtFWFRFUk5BTF0gUmU6IElzIERldmFuYWdhcmkg4KSy4KWN4KSy4KS+?= =?utf-8?B?4KSBIGFtYmlndW91cz8=?= In-Reply-To: <20200506104629.06c8ee3d@JRWUBU2> References: <20200505142346.2c414ceb@JRWUBU2> <20200506104629.06c8ee3d@JRWUBU2> Message-ID: Good question. We support Devanagari with our Indic engine, a peculiarity of this engine is that it doesn?t have a per-run feature application stage and all features are applied at the cluster stage. Addressing this is an outstanding issue. Therefore, the example cluster is a permitted single cluster in our Indic engine. It would certainly be possible to have a ligature more like Whitney's example, but that wasn't included in the plan for the Nirmala font. Cheers, Andrew -----Original Message----- From: Richard Wordingham Sent: 06 May 2020 02:46 To: Andrew Glass ; unicode at unicode.org Subject: [EXTERNAL] Re: Is Devanagari ????? ambiguous? On Tue, 5 May 2020 22:57:14 +0000 Andrew Glass wrote: > As a single cluster with candrabindu applied to the first l: > > 0932 094D 0901 0932 094B > > ????? > > This cluster is supported in Nirmala UI: Just to check, is the formation of the conjunct done within the cluster shaping or after the dissolution of the cluster boundaries? This font seems to have been carefully designed so that a candrabindu within the consonant stack forces half-forms, an approach which prevents the vertical stacking seen in the example from Whitney, but prevents candrabindu on a consonant being rendered the same as candrabindu on a vowel. (I tested the behaviour with the consonant stacks t.ra and t.ta, where Sanskrit doesn't permit candrabindu on the first character.) Richard. From richard.wordingham at ntlworld.com Fri May 8 04:43:36 2020 From: richard.wordingham at ntlworld.com (Richard Wordingham) Date: Fri, 8 May 2020 10:43:36 +0100 Subject: Is Devanagari =?UTF-8?B?4KSy4KWN4KSy4KS+4KSB?= ambiguous? In-Reply-To: References: <20200505142346.2c414ceb@JRWUBU2> <20200506104629.06c8ee3d@JRWUBU2> Message-ID: <20200508104336.4b1d34b3@JRWUBU2> On Thu, 7 May 2020 22:38:22 +0000 Andrew Glass wrote: > Good question. We support Devanagari with our Indic engine, a > peculiarity of this engine is that it doesn?t have a per-run feature > application stage and all features are applied at the cluster stage. > Addressing this is an outstanding issue. Therefore, the example > cluster is a permitted single cluster in our Indic engine. Thanks for the information. I have now raised the HarfBuzz bug as Issue 2392 (https://github.com/harfbuzz/harfbuzz/issues/2392) and have made comment number 416 for the Microsoft typography Devanagari specification, currently at https://docs.microsoft.com/en-us/typography/script-development/devanagari . Richard. From Andrew.Glass at microsoft.com Fri May 8 15:18:38 2020 From: Andrew.Glass at microsoft.com (Andrew Glass) Date: Fri, 8 May 2020 20:18:38 +0000 Subject: =?utf-8?B?UmU6IFtFWFRFUk5BTF0gUmU6IElzIERldmFuYWdhcmkg4KSy4KWN4KSy4KS+?= =?utf-8?B?4KSBIGFtYmlndW91cz8=?= In-Reply-To: <20200508104336.4b1d34b3@JRWUBU2> References: <20200505142346.2c414ceb@JRWUBU2> <20200506104629.06c8ee3d@JRWUBU2> , <20200508104336.4b1d34b3@JRWUBU2> Message-ID: Thank you Richard, I'll update our Indic engine documentation using the issue you created. Andrew Sent from Outlook ________________________________ From: Richard Wordingham Sent: Friday, May 8, 2020 2:43 AM To: Andrew Glass Cc: unicode at unicode.org Subject: [EXTERNAL] Re: Is Devanagari ????? ambiguous? On Thu, 7 May 2020 22:38:22 +0000 Andrew Glass wrote: > Good question. We support Devanagari with our Indic engine, a > peculiarity of this engine is that it doesn?t have a per-run feature > application stage and all features are applied at the cluster stage. > Addressing this is an outstanding issue. Therefore, the example > cluster is a permitted single cluster in our Indic engine. Thanks for the information. I have now raised the HarfBuzz bug as Issue 2392 (https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fharfbuzz%2Fharfbuzz%2Fissues%2F2392&data=02%7C01%7Candrew.glass%40microsoft.com%7Cde47f583c43246588de808d7f3344c0d%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637245278924290476&sdata=ZpC6yhI8CyE4bavmisceTLfjj3SiwjtZraL6pEqHpPQ%3D&reserved=0) and have made comment number 416 for the Microsoft typography Devanagari specification, currently at https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdocs.microsoft.com%2Fen-us%2Ftypography%2Fscript-development%2Fdevanagari&data=02%7C01%7Candrew.glass%40microsoft.com%7Cde47f583c43246588de808d7f3344c0d%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637245278924290476&sdata=Cy1VDE5VyxwtL686U%2BuATucaEuWpA2vfrK6y41JduPE%3D&reserved=0 . Richard. -------------- next part -------------- An HTML attachment was scrubbed... URL: From wjgo_10009 at btinternet.com Mon May 11 10:44:05 2020 From: wjgo_10009 at btinternet.com (wjgo_10009 at btinternet.com) Date: Mon, 11 May 2020 16:44:05 +0100 (BST) Subject: Symbols for Disasters Message-ID: <329d62ec.9e5.1720468535a.Webtop.49@btinternet.com> Symbols for Disasters Hi I saw some time ago the following. https://www.unicode.org/L2/L2020/20078-n4710-liaison-stmt.pdf More recently I saw the following. https://www.unicode.org/L2/L2020/20136-sc2-response.pdf I have been trying to design some symbols and I have today produced and published an experimental font as a suggestion.. https://forum.high-logic.com/viewtopic.php?f=10&t=8406 William Overington Monday 11 May 2020 From wjgo_10009 at btinternet.com Sat May 16 08:06:53 2020 From: wjgo_10009 at btinternet.com (wjgo_10009 at btinternet.com) Date: Sat, 16 May 2020 14:06:53 +0100 (BST) Subject: Abstract emoji Message-ID: <86907e.f2e.1721d9833df.Webtop.216@btinternet.com> Abstract emoji I notice that Public Review 408 QID Emoji has been reopened with a new closing date of 9 July 2020. i wonder if a good mailing list discussion of whether abstract emoji should become implemented, either as part of QID emoji or however the QID emoji idea becomes adapted, or direct into regular Unicode, can take place. http://www.users.globalnet.co.uk/~ngo/abstract_emoji.htm If abstract emoji are allowed then it would greatly increase the expressive capability of emoji, including for communication through the language barrier; however, the meaning of each of at least some abstract emoji would need to be learned by end users. So some quantity of abstract emoji might be good, too many could be confusing. So what would be best please? William Overington Saturday 16 May 2020 From marius.spix at web.de Sat May 16 18:43:17 2020 From: marius.spix at web.de (Marius Spix) Date: Sun, 17 May 2020 01:43:17 +0200 Subject: Security consideration: math symbols in an exotic IP address format in a phishing mail Message-ID: <20200517014230.329b11b5@spixxi> Today I received an interesting phishing mail which had an URL containing mathematical bold numbers. Interestingly the address ??????????? was interpreted as an octal number 05671360302, which is another spelling for 46.229.224.194. This worked for both Firefox and Chrome. I don?t know why such an address is accepted in the authority part of a HTTPS URI of current browsers. Section 7.4 in RFC 3986 states that additional IP address formats can become a security concern, but it also says that literals should be converted to numeric form. I wonder if this case should be added to UTR #36. Regards Marius From bortzmeyer at nic.fr Sun May 17 01:24:09 2020 From: bortzmeyer at nic.fr (Stephane Bortzmeyer) Date: Sun, 17 May 2020 08:24:09 +0200 Subject: Security consideration: math symbols in an exotic IP address format in a phishing mail In-Reply-To: <20200517014230.329b11b5@spixxi> References: <20200517014230.329b11b5@spixxi> Message-ID: <20200517062409.GA10656@nic.fr> On Sun, May 17, 2020 at 01:43:17AM +0200, Marius Spix via Unicode wrote a message of 15 lines which said: > This worked for both Firefox and Chrome. Also in a terminal with ping, so I suspect this is handled by lower name resolution libraries. From ratmice at gmail.com Sun May 17 03:04:52 2020 From: ratmice at gmail.com (Matt Rice) Date: Sun, 17 May 2020 08:04:52 +0000 Subject: characters for edge crossing/edge casing Message-ID: I had looked, but couldn't find any characters suitable for edge crossing/casing such as tunnel's, bridges in the following paper, suitable for orthogonal graph layouts. e.g. vertical/horizontal crossing, somwhat similar to the characters ?, ? U+292B-U+292C, but at 90 degrees. https://arxiv.org/pdf/0705.0413.pdf Or another style of crossing which I forget the name of, https://www.yworks.com/assets/images/features/bridges.ff977b3c.svg Have characters for this purpose been proposed before? From doug at ewellic.org Sun May 17 14:23:06 2020 From: doug at ewellic.org (Doug Ewell) Date: Sun, 17 May 2020 13:23:06 -0600 Subject: characters for edge crossing/edge casing Message-ID: <001901d62c80$97d60530$c7820f90$@ewellic.org> Matt Rice wrote: > I had looked, but couldn't find any characters suitable for edge > crossing/casing such as tunnel's, bridges in the following paper, > suitable for orthogonal graph layouts. > e.g. vertical/horizontal crossing, somwhat similar to the characters > ?, ? U+292B-U+292C, but at 90 degrees. > > https://arxiv.org/pdf/0705.0413.pdf > > Or another style of crossing which I forget the name of, > https://www.yworks.com/assets/images/features/bridges.ff977b3c.svg > > Have characters for this purpose been proposed before? Shapecatcher couldn't find them either, so I suppose they don't exist and could be reasonably proposed. Keep in mind that even at 90 degrees, it should be possible to show examples of them in plain text, not just in diagrams, and that arbitrary angles such as those shown in the paper should be inadmissible. -- Doug Ewell | Thornton, CO, US | ewellic.org From duerst at it.aoyama.ac.jp Sun May 17 18:42:58 2020 From: duerst at it.aoyama.ac.jp (=?UTF-8?Q?Martin_J=2e_D=c3=bcrst?=) Date: Mon, 18 May 2020 08:42:58 +0900 Subject: Security consideration: math symbols in an exotic IP address format in a phishing mail In-Reply-To: <20200517014230.329b11b5@spixxi> References: <20200517014230.329b11b5@spixxi> Message-ID: Hello Marius, others, On 17/05/2020 08:43, Marius Spix via Unicode wrote: > Today I received an interesting phishing mail which had an URL > containing mathematical bold numbers. Interestingly the address > ??????????? was interpreted as an octal number 05671360302, which is > another spelling for 46.229.224.194. This worked for both Firefox and > Chrome. I don?t know why such an address is accepted in the authority > part of a HTTPS URI of current browsers. Section 7.4 in RFC 3986 states > that additional IP address formats can become a security concern, but > it also says that literals should be converted to numeric form. I'm somehow wondering what the *Unicode* phishing story is here. The user saw ???????????, which was interpreted as 05671360302, which shouldn't be too surprising unless somebody is familiar with mathematical bold numbers. The average user wouldn't know what 05671360302 is (unless it's e.g. a familiar telephone number). That should lead the user to reject this URL, and the phishing to fail. A similar should might be expected for 46.229.224.194. Of course, the URL could be designed so as to make these numbers appear natural. And the user may click anyway. There's an Unicode issue if we assume that a) phishing checkers check for cases such as 05671360302, or b) browsers,... don't resolve 05671360302 if it's in ASCII, but ??????????? gets through. Otherwise, there may be a security issue, but it's not an Unicode one. > I wonder if this case should be added to UTR #36. Security considerations are always additive, so I'd guess yes. Regards, Martin. > Regards > > Marius > From magnus at bodin.org Sun May 17 23:39:37 2020 From: magnus at bodin.org (=?UTF-8?Q?Magnus_Bodin_=E2=98=80?=) Date: Mon, 18 May 2020 06:39:37 +0200 Subject: Security consideration: math symbols in an exotic IP address format in a phishing mail In-Reply-To: <20200517062409.GA10656@nic.fr> References: <20200517014230.329b11b5@spixxi> <20200517062409.GA10656@nic.fr> Message-ID: On Sun, May 17, 2020 at 12:06 PM Stephane Bortzmeyer via Unicode wrote: > > On Sun, May 17, 2020 at 01:43:17AM +0200, > Marius Spix via Unicode wrote > a message of 15 lines which said: > > > This worked for both Firefox and Chrome. > > Also in a terminal with ping, so I suspect this is handled by lower > name resolution libraries. Yes, It is an old legacy from BSD libraries that has been inherited. Actually, it accepts various formats (previously even more than 32 bits. I illustrated this with a website here 1998: https://x42.com/active/ip32.mpl?host=archive.org Nowadays the 40, 48 and 56-bit ones are blocked at least in Chrome. -- magnus From c933103 at gmail.com Mon May 18 09:04:14 2020 From: c933103 at gmail.com (Phake Nick) Date: Mon, 18 May 2020 22:04:14 +0800 Subject: Security consideration: math symbols in an exotic IP address format in a phishing mail In-Reply-To: <20200517014230.329b11b5@spixxi> References: <20200517014230.329b11b5@spixxi> Message-ID: Somewhat relevant, I have previously observed that, if you type/produce a link of http://www.abc.def/ghi?jk=lm , and then replace symbol characters in the link with some other confusable symbols, like full width punctuation and such, that link will still take you to the intended address. Different browsers accept different characters. Sometimes when such a link format is being posted onto internet communities that restrict link sharing, such alternative unicode characters formed links can bypass link restrictions in those communities and potentially take unsuspecting netizens to harmful websites. I don't understand why browsers would normalize links being clicked/typed in such way which would expose users to such risk. ? 2020?5?17??? 13:56?Marius Spix via Unicode ??? > Today I received an interesting phishing mail which had an URL > containing mathematical bold numbers. Interestingly the address > ??????????? was interpreted as an octal number 05671360302, > which is > another spelling for 46.229.224.194. This worked for both Firefox and > Chrome. I don?t know why such an address is accepted in the authority > part of a HTTPS URI of current browsers. Section 7.4 in RFC 3986 states > that additional IP address formats can become a security concern, but > it also says that literals should be converted to numeric form. > > I wonder if this case should be added to UTR #36. > > Regards > > Marius > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From cloos at jhcloos.com Tue May 19 01:29:41 2020 From: cloos at jhcloos.com (James Cloos) Date: Tue, 19 May 2020 02:29:41 -0400 Subject: Security consideration: math symbols in an exotic IP address format in a phishing mail In-Reply-To: ("Martin J. =?iso-8859-1?Q?D=FCrst?= via Unicode"'s message of "Mon, 18 May 2020 08:42:58 +0900") References: <20200517014230.329b11b5@spixxi> Message-ID: > I don?t know why such an address is accepted in the authority > part of a HTTPS URI of current browsers. simple. it isn't. at least not here. for me, pasting ??????????? into a browser?s address bar leads to an http GET, not to an https one. (curiosity won.) (which comic was that where the kids were intentionally using integers instead of dns to asccess web sites to confuse anyonewatching them?) -JimC -- James Cloos OpenPGP: 0x997A9F17ED7DAEA6 From marius.spix at web.de Tue May 19 13:56:06 2020 From: marius.spix at web.de (Marius Spix) Date: Tue, 19 May 2020 20:56:06 +0200 Subject: Security consideration: math symbols in an exotic IP address format in a phishing mail In-Reply-To: References: <20200517014230.329b11b5@spixxi> Message-ID: <20200519205602.224b225d@spixxi> > simple. > > it isn't. > > at least not here. > > for me, pasting ??????????? into a browser?s address bar leads to an > http GET, not to an https one. > > (curiosity won.) > > (which comic was that where the kids were intentionally using integers > instead of dns to asccess web sites to confuse anyonewatching them?) > > -JimC I deliberately did not send the complete URI. But I see a problem with spam filters, because they have to recognize a lot more variants of IP addresses. Marius From markus.icu at gmail.com Tue May 19 14:33:58 2020 From: markus.icu at gmail.com (Markus Scherer) Date: Tue, 19 May 2020 12:33:58 -0700 Subject: Security consideration: math symbols in an exotic IP address format in a phishing mail In-Reply-To: References: <20200517014230.329b11b5@spixxi> Message-ID: On Tue, May 19, 2020 at 12:24 PM Phake Nick via Unicode wrote: > Somewhat relevant, I have previously observed that, if you type/produce a > link of http://www.abc.def/ghi?jk=lm , and then replace symbol characters > in the link with some other confusable symbols, like full width punctuation > and such, that link will still take you to the intended address. Different > browsers accept different characters. Sometimes when such a link format is > being posted onto internet communities that restrict link sharing, such > alternative unicode characters formed links can bypass link restrictions in > those communities and potentially take unsuspecting netizens to harmful > websites. > I don't understand why browsers would normalize links being clicked/typed > in such way which would expose users to such risk. > IDNA implementations process domain names using a "mapping" step which is like a variant of NFKC_Casefold. That's why you can use uppercase as well as other canonical and compatibility equivalents, and out-of-order combining marks. markus -------------- next part -------------- An HTML attachment was scrubbed... URL: From lokedhs at gmail.com Tue May 19 20:13:42 2020 From: lokedhs at gmail.com (=?UTF-8?Q?Elias_M=C3=A5rtenson?=) Date: Wed, 20 May 2020 09:13:42 +0800 Subject: characters for edge crossing/edge casing In-Reply-To: <001901d62c80$97d60530$c7820f90$@ewellic.org> References: <001901d62c80$97d60530$c7820f90$@ewellic.org> Message-ID: On Mon, 18 May 2020, 07:12 Doug Ewell via Unicode, wrote: > Matt Rice wrote: > > > I had looked, but couldn't find any characters suitable for edge > > crossing/casing such as tunnel's, bridges in the following paper, > > suitable for orthogonal graph layouts. > > e.g. vertical/horizontal crossing, somwhat similar to the characters > > ?, ? U+292B-U+292C, but at 90 degrees. > > > > https://arxiv.org/pdf/0705.0413.pdf > > > > Or another style of crossing which I forget the name of, > > https://www.yworks.com/assets/images/features/bridges.ff977b3c.svg > > > > Have characters for this purpose been proposed before? > > Shapecatcher couldn't find them either, so I suppose they don't exist and > could be reasonably proposed. > > Keep in mind that even at 90 degrees, it should be possible to show > examples of them in plain text, not just in diagrams, and that arbitrary > angles such as those shown in the paper should be inadmissible. > What about U+2573 BOX DRAWINGS LIGHT DIAGONAL CROSS? > -------------- next part -------------- An HTML attachment was scrubbed... URL: From richard.wordingham at ntlworld.com Wed May 20 03:03:12 2020 From: richard.wordingham at ntlworld.com (Richard Wordingham) Date: Wed, 20 May 2020 09:03:12 +0100 Subject: Security consideration: math symbols in an exotic IP address format in a phishing mail In-Reply-To: References: <20200517014230.329b11b5@spixxi> Message-ID: <20200520090312.47483edc@JRWUBU2> On Mon, 18 May 2020 22:04:14 +0800 Phake Nick via Unicode wrote: > Somewhat relevant, I have previously observed that, if you > type/produce a link of http://www.abc.def/ghi?jk=lm , and then > replace symbol characters in the link with some other confusable > symbols, like full width punctuation and such, that link will still > take you to the intended address. Different browsers accept different > characters. Sometimes when such a link format is being posted onto > internet communities that restrict link sharing, such alternative > unicode characters formed links can bypass link restrictions in those > communities and potentially take unsuspecting netizens to harmful > websites. I don't understand why browsers would normalize links being > clicked/typed in such way which would expose users to such risk. Possible because it hasn't occurred to them to ban users of CJK scripts? Seriously, forcing users to explicitly type narrow punctuation may be one hurdle too far for usability by some. Not all user input of URLs is mere copy and paste. Sometimes one has to manually convert '%2F' to '/'. Calling these characters confusables misses the point that they are variants of ASCII characters. Richard. From cloos at jhcloos.com Wed May 20 05:53:54 2020 From: cloos at jhcloos.com (James Cloos) Date: Wed, 20 May 2020 06:53:54 -0400 Subject: Security consideration: math symbols in an exotic IP address format in a phishing mail In-Reply-To: <20200519205602.224b225d@spixxi> (Marius Spix's message of "Tue, 19 May 2020 20:56:06 +0200") References: <20200517014230.329b11b5@spixxi> <20200519205602.224b225d@spixxi> Message-ID: >>>>> "MS" == Marius Spix writes: MS> I deliberately did not send the complete URI. But I see a problem with MS> spam filters, because they have to recognize a lot more variants of IP MS> addresses. ah. of course. looking at the https url with that string as the hostand no local part, my browsers do choose to block it. seamonkey says the cert is self signed and also that it is not for 46.229.224.194. so at least some platforms/browsers do the right thing. -JimC -- James Cloos OpenPGP: 0x997A9F17ED7DAEA6 From c933103 at gmail.com Wed May 20 10:14:21 2020 From: c933103 at gmail.com (Phake Nick) Date: Wed, 20 May 2020 23:14:21 +0800 Subject: Security consideration: math symbols in an exotic IP address format in a phishing mail In-Reply-To: <20200520090312.47483edc@JRWUBU2> References: <20200517014230.329b11b5@spixxi> <20200520090312.47483edc@JRWUBU2> Message-ID: ? 2020?5?20??? 22:30?Richard Wordingham via Unicode ??? > On Mon, 18 May 2020 22:04:14 +0800 > Phake Nick via Unicode wrote: > > > Somewhat relevant, I have previously observed that, if you > > type/produce a link of http://www.abc.def/ghi?jk=lm , and then > > replace symbol characters in the link with some other confusable > > symbols, like full width punctuation and such, that link will still > > take you to the intended address. Different browsers accept different > > characters. Sometimes when such a link format is being posted onto > > internet communities that restrict link sharing, such alternative > > unicode characters formed links can bypass link restrictions in those > > communities and potentially take unsuspecting netizens to harmful > > websites. I don't understand why browsers would normalize links being > > clicked/typed in such way which would expose users to such risk. > > Possible because it hasn't occurred to them to ban users of CJK > scripts? Seriously, forcing users to explicitly type narrow > punctuation may be one hurdle too far for usability by some. Not all > user input of URLs is mere copy and paste. Sometimes one has to > manually convert '%2F' to '/'. > > Calling these characters confusables misses the point that they are > variants of ASCII characters. > > Richard. As a native Chinese speaker I have never seen anyone typing URL punctuation in full width, other than a.) to confuse URL filtering systems, or b.) on a few archaic printed documents that are not intended to be circulated in digital format. Also, sometimes browsers accept not just the exact fullwidth version of the character but also other similar characters, for example a URL like https://? ?????????/ would also work in Chrome and take you to imgur's site. These characters are being described as "confusable" in UTR #36, which I followed the usage of the term in my email. -------------- next part -------------- An HTML attachment was scrubbed... URL: From Shawn.Steele at microsoft.com Wed May 20 11:27:32 2020 From: Shawn.Steele at microsoft.com (Shawn Steele) Date: Wed, 20 May 2020 16:27:32 +0000 Subject: Security consideration: math symbols in an exotic IP address format in a phishing mail In-Reply-To: <20200520090312.47483edc@JRWUBU2> References: <20200517014230.329b11b5@spixxi> <20200520090312.47483edc@JRWUBU2> Message-ID: Anyone validating links as supposed below should make sure that IDN style normalization happens first... It's kind of a "common" security problem that folks try to check for "security" of data prior to that data undergoing a transformation of some kind, at which point the previous security check may no longer be valid. Note that "Full width" isn't exactly "confusable" in the way IDN thinks of it, since they're mapped directly to their corresponding character. Normally "confusable" is used to refer to characters that may appear similar yet end up resolving to something different. -Shawn -----Original Message----- From: Unicode On Behalf Of Richard Wordingham via Unicode Sent: Wednesday, May 20, 2020 1:03 AM To: unicode at unicode.org Subject: Re: Security consideration: math symbols in an exotic IP address format in a phishing mail On Mon, 18 May 2020 22:04:14 +0800 Phake Nick via Unicode wrote: > Somewhat relevant, I have previously observed that, if you > type/produce a link of http://www.abc.def/ghi?jk=lm , and then replace > symbol characters in the link with some other confusable symbols, like > full width punctuation and such, that link will still take you to the > intended address. Different browsers accept different characters. > Sometimes when such a link format is being posted onto internet > communities that restrict link sharing, such alternative unicode > characters formed links can bypass link restrictions in those > communities and potentially take unsuspecting netizens to harmful > websites. I don't understand why browsers would normalize links being > clicked/typed in such way which would expose users to such risk. Possible because it hasn't occurred to them to ban users of CJK scripts? Seriously, forcing users to explicitly type narrow punctuation may be one hurdle too far for usability by some. Not all user input of URLs is mere copy and paste. Sometimes one has to manually convert '%2F' to '/'. Calling these characters confusables misses the point that they are variants of ASCII characters. Richard. From doug at ewellic.org Wed May 20 16:19:33 2020 From: doug at ewellic.org (Doug Ewell) Date: Wed, 20 May 2020 15:19:33 -0600 Subject: characters for edge crossing/edge casing In-Reply-To: References: <001901d62c80$97d60530$c7820f90$@ewellic.org> Message-ID: <001401d62eec$5bbde720$1339b560$@ewellic.org> I?m not sure what this question is asking. U+2573 doesn?t depict the bottom line being visually broken to show the top line crossing over it, as U+292B and U+292C do. ? Matt appears to be looking for characters like U+292B and U+292C, but tilted 45 degrees so that the lines point north-south and east-west. ? -- Doug Ewell | Thornton, CO, US | ewellic.org ? From: Elias M?rtenson Sent: Tuesday, May 19, 2020 19:14 To: Doug Ewell Cc: unicode Subject: Re: characters for edge crossing/edge casing ? On Mon, 18 May 2020, 07:12 Doug Ewell via Unicode, > wrote: Matt Rice wrote: > I had looked, but couldn't find any characters suitable for edge > crossing/casing such as tunnel's, bridges in the following paper, > suitable for orthogonal graph layouts. > e.g. vertical/horizontal crossing, somwhat similar to the characters > ?, ? U+292B-U+292C, but at 90 degrees. > > https://arxiv.org/pdf/0705.0413.pdf > > Or another style of crossing which I forget the name of, > https://www.yworks.com/assets/images/features/bridges.ff977b3c.svg > > Have characters for this purpose been proposed before? Shapecatcher couldn't find them either, so I suppose they don't exist and could be reasonably proposed. Keep in mind that even at 90 degrees, it should be possible to show examples of them in plain text, not just in diagrams, and that arbitrary angles such as those shown in the paper should be inadmissible. ? What about U+2573 ?BOX DRAWINGS LIGHT DIAGONAL CROSS? ? -------------- next part -------------- An HTML attachment was scrubbed... URL: From duerst at it.aoyama.ac.jp Wed May 20 19:59:13 2020 From: duerst at it.aoyama.ac.jp (=?UTF-8?Q?Martin_J=2e_D=c3=bcrst?=) Date: Thu, 21 May 2020 09:59:13 +0900 Subject: Security consideration: math symbols in an exotic IP address format in a phishing mail In-Reply-To: References: <20200517014230.329b11b5@spixxi> Message-ID: Hello Markus, others, On 20/05/2020 04:33, Markus Scherer via Unicode wrote: > On Tue, May 19, 2020 at 12:24 PM Phake Nick via Unicode > wrote: > >> Somewhat relevant, I have previously observed that, if you type/produce a >> link of http://www.abc.def/ghi?jk=lm , and then replace symbol characters >> in the link with some other confusable symbols, like full width punctuation >> and such, that link will still take you to the intended address. Different >> browsers accept different characters. Sometimes when such a link format is >> being posted onto internet communities that restrict link sharing, such >> alternative unicode characters formed links can bypass link restrictions in >> those communities and potentially take unsuspecting netizens to harmful >> websites. >> I don't understand why browsers would normalize links being clicked/typed >> in such way which would expose users to such risk. >> > > IDNA implementations process domain names using a "mapping" step which is > like a variant of NFKC_Casefold. That in itself isn't a problem, but it depends on the details. > That's why you can use uppercase Good. > as well > as other canonical Good. > and compatibility equivalents, Good up to a point. As discussed already, mapping full-width characters to their half-width equivalents can make a lot of sense for users in China, Japan,... But mapping other compatibility equivalents doesn't make sense at all. Definitely not for Math Bold like in the example at hand, and definitely not for circled characters and the like. > and out-of-order > combining marks. That's just part of canonical equivalence, isn't it? The other very important point is of course that IP addresses are not domain names, and therefore are not covered by IDNA, and shouldn't be mapped in any way. But what happens inside browsers is probably the following: (1) Check if the authority part is an IP address or a domain name. (2) It doesn't look like an (ASCII) IP address, so it's handled as a domain name. (3) Apply IDNA mapping (see above). Produces ASCII numbers. (4) Apply IDNA toASCII conversion (no-op in the case at hand) (5) Feed this to a generic resolver, which includes octal->regular IP address conversion. Narrowing the IDNA mapping as discussed above would fix this case, because the toASCII operation would reject Math Bold as invalid characters. For security checks, rejecting Math Bold (and the like) would also work. But that would have to be restricted to the authority part of a Web address, because such numbers can of course occur in other parts. Regards, Martin. > markus > From markus at gyger.org Thu May 21 05:21:51 2020 From: markus at gyger.org (Markus Gyger) Date: Thu, 21 May 2020 12:21:51 +0200 Subject: Wireless Connection Symbol In-Reply-To: References: Message-ID: Is there a recommended code point (sequence) for the *S01863 Wireless Connection* symbol of IEC 60617 ? The suggested character *U+1F4F6 ? ANTENNA WITH BARS (cellular reception)* seems to have a less general meaning and looks quite different. Some older IEC mappings are in N2032 . Markus -------------- next part -------------- An HTML attachment was scrubbed... URL: From rick at unicode.org Fri May 22 15:49:11 2020 From: rick at unicode.org (Rick McGowan) Date: Fri, 22 May 2020 13:49:11 -0700 Subject: Unicode server, planned maintenance Message-ID: <5EC83AC7.9020507@unicode.org> Hello. This is to let everyone know there is an upcoming maintenance downtime for the Unicode.org servers scheduled in a window from 12am - 6am Pacific time, on May 23, 2020. From doug at ewellic.org Mon May 25 13:34:36 2020 From: doug at ewellic.org (Doug Ewell) Date: Mon, 25 May 2020 12:34:36 -0600 Subject: Wireless Connection Symbol Message-ID: <000001d632c3$249f75d0$6dde6170$@ewellic.org> Markus Gyger wrote: > Is there a recommended code point (sequence) for the *S01863 Wireless > Connection* symbol of IEC 60617 > ? U+1F50A SPEAKER WITH THREE SOUND WAVES ? might be as close as you'll get. Not all symbols for use in diagrams are necessarily candidates for plain-text encoding, although some certainly are and the bar is moving. -- Doug Ewell | Thornton, CO, US | ewellic.org From everson at evertype.com Tue May 26 21:10:29 2020 From: everson at evertype.com (Michael Everson) Date: Wed, 27 May 2020 03:10:29 +0100 Subject: Wireless Connection Symbol In-Reply-To: <000001d632c3$249f75d0$6dde6170$@ewellic.org> References: <000001d632c3$249f75d0$6dde6170$@ewellic.org> Message-ID: <0B8B641E-A5A1-467B-BF19-C6FDF3FD5F0F@evertype.com> On 25 May 2020, at 19:34, Doug Ewell via Unicode wrote: > Not all symbols for use in diagrams are necessarily candidates for plain-text encoding, although some certainly are and the bar is moving. No, and despite the utility of some symbols for scholarship or tech use, we get to have chipmunk-squirrel hybrids and an incomplete set of dinosaurs. Michael Everson From jameskasskrv at gmail.com Tue May 26 22:57:11 2020 From: jameskasskrv at gmail.com (James Kass) Date: Wed, 27 May 2020 03:57:11 +0000 Subject: Wireless Connection Symbol In-Reply-To: <0B8B641E-A5A1-467B-BF19-C6FDF3FD5F0F@evertype.com> References: <000001d632c3$249f75d0$6dde6170$@ewellic.org> <0B8B641E-A5A1-467B-BF19-C6FDF3FD5F0F@evertype.com> Message-ID: <29d231d7-6d5f-f840-e3d9-6b178e871feb@gmail.com> On 2020-05-27 2:10 AM, Michael Everson via Unicode wrote: > On 25 May 2020, at 19:34, Doug Ewell via Unicode wrote: > >> Not all symbols for use in diagrams are necessarily candidates for plain-text encoding, although some certainly are and the bar is moving. > No, and despite the utility of some symbols for scholarship or tech use, we get to have chipmunk-squirrel hybrids and an incomplete set of dinosaurs. > > Michael Everson A dilemma which QID Emoji Tag Sequences would resolve. https://www.unicode.org/review/pri408/ From asmusf at ix.netcom.com Wed May 27 00:25:47 2020 From: asmusf at ix.netcom.com (Asmus Freytag) Date: Tue, 26 May 2020 22:25:47 -0700 Subject: Wireless Connection Symbol In-Reply-To: <29d231d7-6d5f-f840-e3d9-6b178e871feb@gmail.com> References: <000001d632c3$249f75d0$6dde6170$@ewellic.org> <0B8B641E-A5A1-467B-BF19-C6FDF3FD5F0F@evertype.com> <29d231d7-6d5f-f840-e3d9-6b178e871feb@gmail.com> Message-ID: <74c6facd-1703-337d-2105-f959af00cd34@ix.netcom.com> An HTML attachment was scrubbed... URL: From markus at gyger.org Wed May 27 02:09:21 2020 From: markus at gyger.org (Markus Gyger) Date: Wed, 27 May 2020 09:09:21 +0200 Subject: Wireless Connection Symbol In-Reply-To: <000001d632c3$249f75d0$6dde6170$@ewellic.org> References: <000001d632c3$249f75d0$6dde6170$@ewellic.org> Message-ID: On Wed, May 27, 2020 at 1:06 AM Doug Ewell via Unicode wrote: > U+1F50A SPEAKER WITH THREE SOUND WAVES ? might be as close as you'll get. > Thanks, looks visually close. U+1F4F6 ? is probably still closer to some WLAN (or Wi-Fi) emoji though... Markus -------------- next part -------------- An HTML attachment was scrubbed... URL: From prosfilaes at gmail.com Wed May 27 02:32:08 2020 From: prosfilaes at gmail.com (David Starner) Date: Wed, 27 May 2020 00:32:08 -0700 Subject: Wireless Connection Symbol In-Reply-To: <29d231d7-6d5f-f840-e3d9-6b178e871feb@gmail.com> References: <000001d632c3$249f75d0$6dde6170$@ewellic.org> <0B8B641E-A5A1-467B-BF19-C6FDF3FD5F0F@evertype.com> <29d231d7-6d5f-f840-e3d9-6b178e871feb@gmail.com> Message-ID: On Tue, May 26, 2020 at 11:20 PM James Kass via Unicode wrote: > > On 2020-05-27 2:10 AM, Michael Everson via Unicode wrote: > > On 25 May 2020, at 19:34, Doug Ewell via Unicode wrote: > > > >> Not all symbols for use in diagrams are necessarily candidates for plain-text encoding, although some certainly are and the bar is moving. > > No, and despite the utility of some symbols for scholarship or tech use, we get to have chipmunk-squirrel hybrids and an incomplete set of dinosaurs. > > > > Michael Everson > A dilemma which QID Emoji Tag Sequences would resolve. > https://www.unicode.org/review/pri408/ In theory, but in practice, there are 700-some dinosaur species, and encoding them alongside a hundred thousand other emoji is just going to mean that nobody supports any of them. -- The standard is written in English . If you have trouble understanding a particular section, read it again and again and again . . . Sit up straight. Eat your vegetables. Do not mumble. -- _Pascal_, ISO 7185 (1991) From jameskasskrv at gmail.com Wed May 27 04:21:39 2020 From: jameskasskrv at gmail.com (James Kass) Date: Wed, 27 May 2020 09:21:39 +0000 Subject: Wireless Connection Symbol In-Reply-To: <74c6facd-1703-337d-2105-f959af00cd34@ix.netcom.com> References: <000001d632c3$249f75d0$6dde6170$@ewellic.org> <0B8B641E-A5A1-467B-BF19-C6FDF3FD5F0F@evertype.com> <29d231d7-6d5f-f840-e3d9-6b178e871feb@gmail.com> <74c6facd-1703-337d-2105-f959af00cd34@ix.netcom.com> Message-ID: <3fccb199-a5da-83ef-de23-62406f5cec60@gmail.com> On 2020-05-27 5:25 AM, Asmus Freytag via Unicode wrote: > I?ve already said this on the previous PRI, but it bears repeating: QID > sequences are fundamentally unworkable because they destroy the concept of > character identity. I firmly believe the UTC is considerably underestimating > the implications of providing a mechanism that can encode exactly the same > information in several, mutually incompatible ways. ... As opposed to the current mechanism in which users cannot encode their desired information at all? Unicode already provides a method for encoding the same information incompatibly, the PUA.? The QID emoji proposal seeks to standardize the plain-text interchange of any desired unencoded image, which would avoid the PUA issues. There's more than one way to encode LATIN CAPITAL LETTER E WITH ACUTE compatibly.? If there were more than one way to encode an image of an eohippus in Unicode, they could be considered compatible. If the concern wrt compatibility is that "image of an eohippus" might some day become an atomic Unicode character, somehow conflicting with the QID Emoji encoding, then it's suggested that with an existing plain-text interchange mechanism nobody would need to propose "image of an eohippus" as an atomic character. It's also suggested that concerns about character identity needn't apply to "image of an eohippus" and the like because they haven't any.? "Image of an eohippus" is exactly that, nothing more and nothing less.? Interpreting the image as a meaningful symbol is up to the organic intelligence reading the text.? Meanwhile, the intention of the author is discoverable by any artificial process, namely that the author intended to send an image of an eohippus. From jameskasskrv at gmail.com Wed May 27 05:30:38 2020 From: jameskasskrv at gmail.com (James Kass) Date: Wed, 27 May 2020 10:30:38 +0000 Subject: QID Emoji (was Re: Wireless Connection Symbol) In-Reply-To: References: <000001d632c3$249f75d0$6dde6170$@ewellic.org> <0B8B641E-A5A1-467B-BF19-C6FDF3FD5F0F@evertype.com> <29d231d7-6d5f-f840-e3d9-6b178e871feb@gmail.com> Message-ID: <0f60b7aa-7818-2599-55dc-48541c65c89c@gmail.com> On 2020-05-27 7:32 AM, David Starner wrote: > On Tue, May 26, 2020 at 11:20 PM James Kass via Unicode > wrote: >> On 2020-05-27 2:10 AM, Michael Everson via Unicode wrote: >>> On 25 May 2020, at 19:34, Doug Ewell via Unicode wrote: >>> >>>> Not all symbols for use in diagrams are necessarily candidates for plain-text encoding, although some certainly are and the bar is moving. >>> No, and despite the utility of some symbols for scholarship or tech use, we get to have chipmunk-squirrel hybrids and an incomplete set of dinosaurs. >>> >>> Michael Everson >> A dilemma which QID Emoji Tag Sequences would resolve. >> https://www.unicode.org/review/pri408/ > In theory, but in practice, there are 700-some dinosaur species, and > encoding them alongside a hundred thousand other emoji is just going > to mean that nobody supports any of them. > In the short term, yes.? In the long term support will be driven by demand. If enough users demand the ability to exchange an image of a basenji-doberman hybrid in plain-text, display support will be forthcoming.? If the demand isn't enough to stimulate the large corporate players, then third-party support will step in.? If there's no demand, it's moot.? But it's discoverable under the QID Emoji proposal regardless of demand level. If approved, it's already supported as far as Unicode is concerned. Because it uses strings of already encoded characters which are already interchangeable.? Display issues and input methods have traditionally been considered outside the scope of The Standard. From abrahamgross at disroot.org Wed May 27 03:13:59 2020 From: abrahamgross at disroot.org (abrahamgross at disroot.org) Date: Wed, 27 May 2020 08:13:59 +0000 (UTC) Subject: Wireless Connection Symbol In-Reply-To: References: <000001d632c3$249f75d0$6dde6170$@ewellic.org> Message-ID: <9036c9f9-0fa9-4895-a8c3-ccc86c174e44@disroot.org> Maybe try a writing a combination of characters like ??) So that it looks like the symbol (but make it look good though) 2020/05/27 ??3:10:21 Markus Gyger via Unicode : > On Wed, May 27, 2020 at 1:06 AM Doug Ewell via Unicode wrote: >> U+1F50A SPEAKER WITH THREE SOUND WAVES ? might be as close as you'll get. >> Thanks, looks visually close. U+1F4F6 ? is probably still closer to some WLAN (or Wi-Fi) emoji though... > > Markus > -------------- next part -------------- An HTML attachment was scrubbed... URL: From wjgo_10009 at btinternet.com Wed May 27 08:40:22 2020 From: wjgo_10009 at btinternet.com (wjgo_10009 at btinternet.com) Date: Wed, 27 May 2020 14:40:22 +0100 (BST) Subject: QID Emoji Message-ID: <410f7ff6.4c3.172565cd027.Webtop.53@btinternet.com> QID Emoji can be discussed in this mailing list. One of my ideas is banned from even being discussed in this mailing list and deemed to be out of scope! Yet the research continues. But I am not going to use QID Emoji to encode the items. Either encoding of the concept gets done excellently with its own structured tagspace or it can go round again and again until it is encoded excellently in Unicode. The documents are all deposited for conservation with the British Library. Yet if QID Emoji are encoded, then maybe my idea will become encoded too on a "sauce for pasta is sauce for rice" basis as my idea is at least as rigorous as the QID Emoji proposal. William Overington Wednesday 27 May 2020 From sosipiuk at gmail.com Wed May 27 11:18:01 2020 From: sosipiuk at gmail.com (=?UTF-8?Q?S=C5=82awomir_Osipiuk?=) Date: Wed, 27 May 2020 12:18:01 -0400 Subject: QID Emoij (was: Re: Wireless Connection Symbol) Message-ID: <001e01d63442$659619b0$30c24d10$@gmail.com> Users can encode their desired information using the PUA. That is precisely what it is for. What QID seems to be proposing is a way to assign an interchange-suitable, (semi-)stable ID to a meaningful symbol. I don't see how that is anything other than what Unicode itself is meant to provide: A standardized ID number for a character. The QID process seems to be merely a way to "assign emojis faster" because the current process isn't responsive enough for vendors' liking. It's another layer of Unicode sitting atop Unicode, motivated by a desire for less oversight from the Unicode Consortium. I agree with the "nay" comments. Besides all the practical downsides, I think QID emojis invite a free-for-all that goes against the very spirit of Unicode as being a standardized database of character/emoji IDs. The issue to be resolved here lies in the process for adding emojis. The current process is too onerous and slow. I can imagine a new process, that isn't bound to a regular schedule, and that allows eminently useful and needed emojis to be fast-tracked to approval in days, not months. Perhaps an entire plane could be reserved for such emojis - 65K should be enough for anyone, right? ;) Perhaps there could be a provisional or probationary approval granted to certain emojis, or at least a "reservation" system for code points. A vendor could reserve spaces with emojis they plan to add (with reasonable limits, of course). There could be a public voting system to add or approve emojis in near-real-time based on thresholds for approval. It's 2020; we have the technology. Provisional emojis or code points reservations that don't see use/support after some amount of time are rejected and code points are allowed to be reused. Those that see use or public support are given final approval and become bound by stability requirements. The Unicode Consortium is still involved, but less so, relying more on automated metrics than meetings, though they would still have veto power if there is some valid subjective factor to consider. The details are something to be worked out. The main point is that there is a desire for a quicker, more responsive way to add emojis. That can be done without essentially reconstructing Unicode on top of itself. S?awomir Osipiuk From markus at gyger.org Wed May 27 14:51:45 2020 From: markus at gyger.org (Markus Gyger) Date: Wed, 27 May 2020 21:51:45 +0200 Subject: Wireless Connection Symbol In-Reply-To: <9036c9f9-0fa9-4895-a8c3-ccc86c174e44@disroot.org> References: <000001d632c3$249f75d0$6dde6170$@ewellic.org> <9036c9f9-0fa9-4895-a8c3-ccc86c174e44@disroot.org> Message-ID: On Wed, May 27, 2020 at 4:26 PM abrahamgross--- via Unicode < unicode at unicode.org> wrote: > Maybe try a writing a combination of characters like > ??) > So that it looks like the symbol (but make it look good though) > Great idea, thanks! ??) looks even good in e.g. Source Sans Pro . I'll probably just encode it as a "ligature" of two characters: ?) Markus -------------- next part -------------- An HTML attachment was scrubbed... URL: From kent.b.karlsson at bahnhof.se Wed May 27 17:50:10 2020 From: kent.b.karlsson at bahnhof.se (Kent Karlsson) Date: Thu, 28 May 2020 00:50:10 +0200 Subject: Wireless Connection Symbol In-Reply-To: <3fccb199-a5da-83ef-de23-62406f5cec60@gmail.com> References: <000001d632c3$249f75d0$6dde6170$@ewellic.org> <0B8B641E-A5A1-467B-BF19-C6FDF3FD5F0F@evertype.com> <29d231d7-6d5f-f840-e3d9-6b178e871feb@gmail.com> <74c6facd-1703-337d-2105-f959af00cd34@ix.netcom.com> <3fccb199-a5da-83ef-de23-62406f5cec60@gmail.com> Message-ID: <0840AF94-25F2-4682-B66C-803B741625B7@bahnhof.se> Embedding images (whichever one you like and ?you' have access to?) in text can already be done. In HTML markup it is called ?img? (e.g. ??). And there is no real question about which images will will be ?supported?. Granted, it is not plain text. But emoji are already pushing ?out of? plain text as we knew it. And? I recall an argument (years ago) saying essentially ?these will be the only emoji encoded, the recommendation for expansion is to use images instead?. That seems to have been forgotten? /Kent Karlsson PS I agree with Asmus that the ?QID emoji? is a really bad idea. > 27 maj 2020 kl. 11:21 skrev James Kass via Unicode : > > > On 2020-05-27 5:25 AM, Asmus Freytag via Unicode wrote: >> I?ve already said this on the previous PRI, but it bears repeating: QID >> sequences are fundamentally unworkable because they destroy the concept of >> character identity. I firmly believe the UTC is considerably underestimating >> the implications of providing a mechanism that can encode exactly the same >> information in several, mutually incompatible ways. ... > > As opposed to the current mechanism in which users cannot encode their desired information at all? > > Unicode already provides a method for encoding the same information incompatibly, the PUA. The QID emoji proposal seeks to standardize the plain-text interchange of any desired unencoded image, which would avoid the PUA issues. > > There's more than one way to encode LATIN CAPITAL LETTER E WITH ACUTE compatibly. If there were more than one way to encode an image of an eohippus in Unicode, they could be considered compatible. > > If the concern wrt compatibility is that "image of an eohippus" might some day become an atomic Unicode character, somehow conflicting with the QID Emoji encoding, then it's suggested that with an existing plain-text interchange mechanism nobody would need to propose "image of an eohippus" as an atomic character. > > It's also suggested that concerns about character identity needn't apply to "image of an eohippus" and the like because they haven't any. "Image of an eohippus" is exactly that, nothing more and nothing less. Interpreting the image as a meaningful symbol is up to the organic intelligence reading the text. Meanwhile, the intention of the author is discoverable by any artificial process, namely that the author intended to send an image of an eohippus. > > From markus.icu at gmail.com Wed May 27 21:53:04 2020 From: markus.icu at gmail.com (Markus Scherer) Date: Wed, 27 May 2020 19:53:04 -0700 Subject: Wireless Connection Symbol In-Reply-To: <0840AF94-25F2-4682-B66C-803B741625B7@bahnhof.se> References: <000001d632c3$249f75d0$6dde6170$@ewellic.org> <0B8B641E-A5A1-467B-BF19-C6FDF3FD5F0F@evertype.com> <29d231d7-6d5f-f840-e3d9-6b178e871feb@gmail.com> <74c6facd-1703-337d-2105-f959af00cd34@ix.netcom.com> <3fccb199-a5da-83ef-de23-62406f5cec60@gmail.com> <0840AF94-25F2-4682-B66C-803B741625B7@bahnhof.se> Message-ID: On Wed, May 27, 2020 at 6:20 PM Kent Karlsson via Unicode < unicode at unicode.org> wrote: > Granted, it is not plain text. But emoji are already pushing ?out of? > plain text as we knew it. And? I recall an argument (years ago) saying > essentially > ?these will be the only emoji encoded, the recommendation for expansion is > to use images instead?. That seems to have been forgotten? > Not entirely forgotten... http://www.unicode.org/reports/tr51/#Longer_Term markus -------------- next part -------------- An HTML attachment was scrubbed... URL: From nospam-abuse at ilyaz.org Thu May 28 03:42:25 2020 From: nospam-abuse at ilyaz.org (Ilya Zakharevich) Date: Thu, 28 May 2020 01:42:25 -0700 Subject: ISO 14651/14652 vs Unicode sorting Message-ID: <20200528084225.mglngkxuhyc77vnf@math.berkeley.edu> I have been informed that according to the tables distributed with ISO 14651/14652, the following strings should be sorted in this order: > foobar > foo baz Moreover, this is how glibc (and, as a corollary, all utilities) do this in European locales on contemporary Linuxes. I checked COBUILT, American Heritage, and Le Petit Robert II???and it seems that they do indeed use this (brain damaged?) order. (Although not, apparently, Le Petit Robert I???which SEEMS TO HAVE compound words tackled at the end of the main record.) However, this definitely contradicts what https://icu4c-demos-7hxm2n5zgq-uc.a.run.app/icu-bin/collation.html does with the default locale, and with `en?. So what is the intended behavior: of ICU, or of ISO?! Thanks, Ilya From kenwhistler at sonic.net Thu May 28 09:52:03 2020 From: kenwhistler at sonic.net (Ken Whistler) Date: Thu, 28 May 2020 07:52:03 -0700 Subject: ISO 14651/14652 vs Unicode sorting In-Reply-To: <20200528084225.mglngkxuhyc77vnf@math.berkeley.edu> References: <20200528084225.mglngkxuhyc77vnf@math.berkeley.edu> Message-ID: <3c23b16c-27cd-3780-f63e-4dcfd81f33f6@sonic.net> Ilya, On this topic, see the extended discussion of variable weighting in UTS #10: https://www.unicode.org/reports/tr10/#Variable_Weighting_Examples On 5/28/2020 1:42 AM, Ilya Zakharevich via Unicode wrote: > I have been informed that according to the tables distributed with ISO > 14651/14652, the following strings should be sorted in this order: > >> foobar >> foo baz > Moreover, this is how glibc (and, as a corollary, all utilities) do > this in European locales on contemporary Linuxes. ISO 14651 recommends the "Shifted" handling of variables. In this particular case, your concern is with the handling of U+0020 SPACE, but that choice also affects all punctuation and symbols, unless otherwise tailored. > > I checked COBUILT, American Heritage, and Le Petit Robert II???and it > seems that they do indeed use this (brain damaged?) order. (Although > not, apparently, Le Petit Robert I???which SEEMS TO HAVE compound > words tackled at the end of the main record.) Precisely what happens in various dictionaries is a bit beside the point, because they often follow somewhat special rules that may not always directly match the results of just taking all the headwords and sorting the strings according to a particular collation setting. They may require special tailoring. > > However, this definitely contradicts what > https://icu4c-demos-7hxm2n5zgq-uc.a.run.app/icu-bin/collation.html > does with the default locale, and with `en?. In that demo, the collation *defaults* to "Non-ignorable". Again, see the discussion of variable weighting cited above. In a "Non-ignorable" collation, the primary weights of the variables (space included) *are* used at the primary level of sortkey construction, instead of being shifted to only make a difference following any tertiary weight differences. So you get the results in the demo you see where the space character "makes a difference" -- namely, that it is weighted as significantly as other full letters. However, if you switch options in that demo to "Shifted" -- see the the seventh line of the radio buttons, labeled "alternate", then you get the Shifted weighting, which will then mirror the results you see for glibc. > > So what is the intended behavior: of ICU, or of ISO?! There is no "right answer" here. The Unicode Collation Algorithm comes with built-in alternative parametric settings, and, of course, the option to tailor the collation rules indefinitely, to meet the requirements of particular languages and/or particular dictionary orderings or other special purposes. ISO 14651 also allows different settings (although not as completely spelled out as in UCA) and tailorings. What glibc has done is pick the default, out-of-the-box shifted handling of variables implied by ISO 14651, but that is simply an implementation choice. --Ken > > Thanks, > Ilya > From wjgo_10009 at btinternet.com Thu May 28 13:10:57 2020 From: wjgo_10009 at btinternet.com (wjgo_10009 at btinternet.com) Date: Thu, 28 May 2020 19:10:57 +0100 (BST) Subject: QID Emoji (from Re: Wireless Connection Symbol) Message-ID: <4bddc04c.14fe.1725c7ae4cd.Webtop.41@btinternet.com> QID Emoji (from Re: Wireless Connection Symbol) Kent Karlsson wrote as follows. > I agree with Asmus that the ?QID emoji? is a really bad idea. I opine that when considering a new idea it is important to be prepared to suspend disbelief and consider if any parts of the idea are good, rather than just the total idea. I find the QID Emoji proposal has some very good aspects but is somewhat unstable as a whole. So, if those in favour of the proposal and those against are each willing to be like the strongest trees and sway in the breeze then the good parts of the proposal could become available in a stable manner. For example, maybe registration in a Unicode Inc. database, with the option of a cross-reference link to QID, would mean that only those QID where someone wants an emoji for that QID would be in the Unicode Inc. database, and a gentle moderation policy could be used to stop ambiguity and duplication. So maybe shorter codes. What if U+FFF0 is defined, mutatis mutandis, as effectively what would be a ligature of the ID emoji and tag Q in the original proposal, U+FFF8 is defined as the corresponding CANCEL and circled digits are used. All part of the basic plane, so fewer bytes for each such character and a graceful indicative fallback facility built in. I realize that the original proposal can be implemented with existing technology, and that the changes I suggest would require changes to The Unicode Standard and software packages, but that could be done in time if there is the will to do so, yet whatever solution is implemented is there for a very long time. Would those two changes both go a long way towards making a solution that is acceptable to everybody? I may not have solved every objection and what I suggest does change the original. Yet this is research for the future. So, if people agree, please say so, if not then please say what I have missed or got wrong and what needs fixing and then, as a group effort, maybe we can iterate in a constructive way and achieve a good solution acceptable to everybody. William Overington Thursday 28 May 2020 From richard.wordingham at ntlworld.com Thu May 28 16:34:06 2020 From: richard.wordingham at ntlworld.com (Richard Wordingham) Date: Thu, 28 May 2020 22:34:06 +0100 Subject: ISO 14651/14652 vs Unicode sorting In-Reply-To: <20200528084225.mglngkxuhyc77vnf@math.berkeley.edu> References: <20200528084225.mglngkxuhyc77vnf@math.berkeley.edu> Message-ID: <20200528223406.1efd0d36@JRWUBU2> On Thu, 28 May 2020 01:42:25 -0700 Ilya Zakharevich via Unicode wrote: > So what is the intended behavior: of ICU, or of ISO?! Does ICU now support the ISO 14651 default for NFD strings of assigned characters? I thought supporting DUCET was abandoned several years ago. Richard. From kent.b.karlsson at bahnhof.se Thu May 28 17:19:28 2020 From: kent.b.karlsson at bahnhof.se (Kent Karlsson) Date: Fri, 29 May 2020 00:19:28 +0200 Subject: Wireless Connection Symbol In-Reply-To: References: <000001d632c3$249f75d0$6dde6170$@ewellic.org> <0B8B641E-A5A1-467B-BF19-C6FDF3FD5F0F@evertype.com> <29d231d7-6d5f-f840-e3d9-6b178e871feb@gmail.com> <74c6facd-1703-337d-2105-f959af00cd34@ix.netcom.com> <3fccb199-a5da-83ef-de23-62406f5cec60@gmail.com> <0840AF94-25F2-4682-B66C-803B741625B7@bahnhof.se> Message-ID: <3A410044-B4F7-4489-8CC4-3562D4E84ADE@bahnhof.se> > 28 maj 2020 kl. 04:53 skrev Markus Scherer via Unicode : > > On Wed, May 27, 2020 at 6:20 PM Kent Karlsson via Unicode > wrote: > Granted, it is not plain text. But emoji are already pushing ?out of? plain text as we knew it. And? I recall an argument (years ago) saying essentially > ?these will be the only emoji encoded, the recommendation for expansion is to use images instead?. That seems to have been forgotten? > > Not entirely forgotten... > > http://www.unicode.org/reports/tr51/#Longer_Term > > markus Ok. Thanks for pointing that out. Glad it is not entirely forgotten. One little nit: ?Other features required to make embedded graphics work well include the ability of images to scale with font size? That sounds a little bit like one was requiring a small revolution in image rendering. But of course it is not (ok, HTML again): ?? (1em being the typical height and width of emoji glyphs.) /Kent K -------------- next part -------------- An HTML attachment was scrubbed... URL: From markus.icu at gmail.com Thu May 28 21:14:05 2020 From: markus.icu at gmail.com (Markus Scherer) Date: Thu, 28 May 2020 19:14:05 -0700 Subject: ISO 14651/14652 vs Unicode sorting In-Reply-To: <20200528223406.1efd0d36@JRWUBU2> References: <20200528084225.mglngkxuhyc77vnf@math.berkeley.edu> <20200528223406.1efd0d36@JRWUBU2> Message-ID: On Thu, May 28, 2020 at 5:05 PM Richard Wordingham via Unicode < unicode at unicode.org> wrote: > Does ICU now support the ISO 14651 default for NFD strings of > assigned characters? I thought supporting DUCET was abandoned several > years ago. > ICU uses the CLDR sort order which is a mild tailoring of the DUCET. markus -------------- next part -------------- An HTML attachment was scrubbed... URL: From abrahamgross at disroot.org Fri May 29 01:19:52 2020 From: abrahamgross at disroot.org (abrahamgross at disroot.org) Date: Fri, 29 May 2020 06:19:52 +0000 (UTC) Subject: QID Emoji (from Re: Wireless Connection Symbol) In-Reply-To: <4bddc04c.14fe.1725c7ae4cd.Webtop.41@btinternet.com> References: <4bddc04c.14fe.1725c7ae4cd.Webtop.41@btinternet.com> Message-ID: <4da8599e-8e5d-49d7-9701-be8d227bfbfd@disroot.org> What if instead of using a QID which on its own is just meaningless numbers, we do tag_start, then write out a word in the tag letters, then close it off with a tag_end. this way, if your device doesn't have the font, it can fall back by rendering the tags through regular character (with an indicator that an emoji is supposed to be there) Example: Suppose I type: ?I love ?triceratops?? (with ? being tag_start; ? being tag_end; and everything between being tag ascii characters E0020-E007E), My phone, that has the correct font installed, will render it as [cid:eu.faircode.email.987] While for someone else without the font, it would render as something like [cid:eu.faircode.email.988] This way the intent of the original message will carry across to the reader correctly whether they have the correct font or not. This makes it better than PUA due to the information not being completely lost, and as a another bonus, screen readers would be able to read this just fine (i.e. ?I love ?triceratops??, where ? represents a tone.) I understand that this uses more space than QID sequences, but I think the payoff of not having to worry about ?/?/?/? everywhere is worth it. 2020/05/28 ??2:32:28 wjgo_10009--- via Unicode : > QID Emoji (from Re: Wireless Connection Symbol) > > Kent Karlsson wrote as follows. > >> I agree with Asmus that the ?QID emoji? is a really bad idea. >> > I opine that when considering a new idea it is important to be prepared to suspend disbelief and consider if any parts of the idea are good, rather than just the total idea. > > I find the QID Emoji proposal has some very good aspects but is somewhat unstable as a whole. > > So, if those in favour of the proposal and those against are each willing to be like the strongest trees and sway in the breeze then the good parts of the proposal could become available in a stable manner. > > For example, maybe registration in a Unicode Inc. database, with the option of a cross-reference link to QID, would mean that only those QID where someone wants an emoji for that QID would be in the Unicode Inc. database, and a gentle moderation policy could be used to stop ambiguity and duplication. So maybe shorter codes. > > What if U+FFF0 is defined, mutatis mutandis, as effectively what would be a ligature of the ID emoji and tag Q in the original proposal, U+FFF8 is defined as the corresponding CANCEL and circled digits are used. All part of the basic plane, so fewer bytes for each such character and a graceful indicative fallback facility built in. > > I realize that the original proposal can be implemented with existing technology, and that the changes I suggest would require changes to The Unicode Standard and software packages, but that could be done in time if there is the will to do so, yet whatever solution is implemented is there for a very long time. > > Would those two changes both go a long way towards making a solution that is acceptable to everybody? > > I may not have solved every objection and what I suggest does change the original. Yet this is research for the future. So, if people agree, please say so, if not then please say what I have missed or got wrong and what needs fixing and then, as a group effort, maybe we can iterate in a constructive way and achieve a good solution acceptable to everybody. > > William Overington > > Thursday 28 May 2020 > -------------- next part -------------- A non-text attachment was scrubbed... Name: image:432838 Type: image/png Size: 48480 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Render.png Type: image/png Size: 24595 bytes Desc: not available URL: From jameskasskrv at gmail.com Fri May 29 02:14:15 2020 From: jameskasskrv at gmail.com (James Kass) Date: Fri, 29 May 2020 07:14:15 +0000 Subject: QID Emoij (was: Re: Wireless Connection Symbol) In-Reply-To: <0840AF94-25F2-4682-B66C-803B741625B7@bahnhof.se> References: <000001d632c3$249f75d0$6dde6170$@ewellic.org> <0B8B641E-A5A1-467B-BF19-C6FDF3FD5F0F@evertype.com> <29d231d7-6d5f-f840-e3d9-6b178e871feb@gmail.com> <74c6facd-1703-337d-2105-f959af00cd34@ix.netcom.com> <3fccb199-a5da-83ef-de23-62406f5cec60@gmail.com> <0840AF94-25F2-4682-B66C-803B741625B7@bahnhof.se> Message-ID: <5c735b52-3979-5d53-48a3-62f7db29cc17@gmail.com> On 2020-05-27 10:50 PM, Kent Karlsson via Unicode wrote: > Embedding images (whichever one you like and ?you' have access to?) in text can already be done. > > In HTML markup it is called ?img? (e.g. ??). And there is no real question about which images will will be ?supported?. > The point Kent Karlsson makes here is just as valid today as it was when it was used as an argument against encoding the first set of emoji in Unicode. It's true that there are proper ways of sticking random images in running text.? It's too bad that the emoji user community doesn't seem interested in that type of solution. An underlying question is whether it's better for profit driven corporate interests to determine the emoji repertoire, or to let the set evolve naturally based on user community desires. The QID Emoji proposal enables the latter, which is one of the reasons I'm in favor of it. Naysayers will point out obstacles in any such approach.? In the English language, anyone who considers any and all obstacles insurmountable is referred to as a quitter. From prosfilaes at gmail.com Fri May 29 02:40:40 2020 From: prosfilaes at gmail.com (David Starner) Date: Fri, 29 May 2020 00:40:40 -0700 Subject: QID Emoij (was: Re: Wireless Connection Symbol) In-Reply-To: <5c735b52-3979-5d53-48a3-62f7db29cc17@gmail.com> References: <000001d632c3$249f75d0$6dde6170$@ewellic.org> <0B8B641E-A5A1-467B-BF19-C6FDF3FD5F0F@evertype.com> <29d231d7-6d5f-f840-e3d9-6b178e871feb@gmail.com> <74c6facd-1703-337d-2105-f959af00cd34@ix.netcom.com> <3fccb199-a5da-83ef-de23-62406f5cec60@gmail.com> <0840AF94-25F2-4682-B66C-803B741625B7@bahnhof.se> <5c735b52-3979-5d53-48a3-62f7db29cc17@gmail.com> Message-ID: On Fri, May 29, 2020 at 12:19 AM James Kass via Unicode wrote: > An underlying question is whether it's better for profit driven > corporate interests to determine the emoji repertoire, or to let the set > evolve naturally based on user community desires. > > The QID Emoji proposal enables the latter, which is one of the reasons > I'm in favor of it. I don't see it. Profit driven corporate interests may or may not support QID Emoji; if they don't, it's practically dead in the water. If they do, Google is going to make a list of corporately supported emoji, just like what started this, and that's going to be the list of supported QID emoji. Outside that corporate line, there's going to be about zero chance anyone tries to use QID emoji, because at most one in a million QID emoji are going to be supported, so even if you do want to use an emoji from the Palmer's Chipmunk, what's the right QID for that? It seems to have three, one for each scientific name, plus maybe using the genus, Tamias, will be better supported, or Marmotini, or Sciuridae, but if you're getting that vague, why not use the existing emoji? So six alternatives for QID emoji, and none of them will probably work, so why should the user community bother? I guess instead of bothering Unicode, they could bother Google to add it to their list and provide fonts, but there's that "profit driven corporate interests" again. > Naysayers will point out obstacles in any such approach. In the English > language, anyone who considers any and all obstacles insurmountable is > referred to as a quitter. People who use the term quitter in such a sense often get a ride from a police car or ambulance later that night. It's a bizarre word to use when you've decided Unicode should be a quitter and stop even trying to manage emoji. -- The standard is written in English . If you have trouble understanding a particular section, read it again and again and again . . . Sit up straight. Eat your vegetables. Do not mumble. -- _Pascal_, ISO 7185 (1991) From abrahamgross at disroot.org Fri May 29 02:48:47 2020 From: abrahamgross at disroot.org (abrahamgross at disroot.org) Date: Fri, 29 May 2020 07:48:47 +0000 (UTC) Subject: QID Emoij (was: Re: Wireless Connection Symbol) In-Reply-To: References: <000001d632c3$249f75d0$6dde6170$@ewellic.org> <0B8B641E-A5A1-467B-BF19-C6FDF3FD5F0F@evertype.com> <29d231d7-6d5f-f840-e3d9-6b178e871feb@gmail.com> <74c6facd-1703-337d-2105-f959af00cd34@ix.netcom.com> <3fccb199-a5da-83ef-de23-62406f5cec60@gmail.com> <0840AF94-25F2-4682-B66C-803B741625B7@bahnhof.se> <5c735b52-3979-5d53-48a3-62f7db29cc17@gmail.com> Message-ID: What if instead of using a QID which on its own is just meaningless numbers, we do tag_start, then write out a word in the tag letters, then close it off with a tag_end. this way, if your device doesn't have the font, it can fall back by rendering the tags through regular character (with an indicator that an emoji is supposed to be there) Example: Suppose I type: ?I love ?triceratops?? (with ? being tag_start; ? being tag_end; and everything between being tag ascii characters E0020-E007E), My phone, that has the correct font installed, will render it as [cid:eu.faircode.email.996] While for someone else without the font, it would render as something like ?I love ????????????? [cid:eu.faircode.email.998] This way the intent of the original message will carry across to the reader correctly whether they have the correct font or not. This makes it better than PUA due to the information not being completely lost, and as a another bonus, screen readers would be able to read this just fine (i.e. ?I love ?triceratops??, where ? represents a tone.) I understand that this uses more space than QID sequences, but I think the payoff of not having to worry about ?/?/?/? everywhere is worth it. -------------- next part -------------- A non-text attachment was scrubbed... Name: image:432838 Type: image/png Size: 48480 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Render.png Type: image/png Size: 24595 bytes Desc: not available URL: From marius.spix at web.de Fri May 29 03:34:45 2020 From: marius.spix at web.de (Marius Spix) Date: Fri, 29 May 2020 10:34:45 +0200 Subject: Aw: Re: Wireless Connection Symbol In-Reply-To: <3A410044-B4F7-4489-8CC4-3562D4E84ADE@bahnhof.se> References: <000001d632c3$249f75d0$6dde6170$@ewellic.org> <0B8B641E-A5A1-467B-BF19-C6FDF3FD5F0F@evertype.com> <29d231d7-6d5f-f840-e3d9-6b178e871feb@gmail.com> <74c6facd-1703-337d-2105-f959af00cd34@ix.netcom.com> <3fccb199-a5da-83ef-de23-62406f5cec60@gmail.com> <0840AF94-25F2-4682-B66C-803B741625B7@bahnhof.se> <3A410044-B4F7-4489-8CC4-3562D4E84ADE@bahnhof.se> Message-ID: An HTML attachment was scrubbed... URL: From richard.wordingham at ntlworld.com Fri May 29 03:38:43 2020 From: richard.wordingham at ntlworld.com (Richard Wordingham) Date: Fri, 29 May 2020 09:38:43 +0100 Subject: ISO 14651/14652 vs Unicode sorting In-Reply-To: References: <20200528084225.mglngkxuhyc77vnf@math.berkeley.edu> <20200528223406.1efd0d36@JRWUBU2> Message-ID: <20200529093843.36ccfa3a@JRWUBU2> On Thu, 28 May 2020 19:14:05 -0700 Markus Scherer via Unicode wrote: > On Thu, May 28, 2020 at 5:05 PM Richard Wordingham via Unicode < > unicode at unicode.org> wrote: > > > Does ICU now support the ISO 14651 default for NFD strings of > > assigned characters? I thought supporting DUCET was abandoned > > several years ago. > ICU uses the CLDR sort order which is a mild tailoring of the DUCET. Depending on what you mean by tailoring. By definition, it's not a tailoring in the CLDR sense! I'll take that answer as 'no'. ICU used to claim to support DUCET, and I suspect that there is a tailoring that will get through the UCA compliance testing if restricted to assigned characters. However, it seems that the ICU/CLDR teams decided that that goal wasn't worth the effort. Richard. From jameskasskrv at gmail.com Fri May 29 04:14:59 2020 From: jameskasskrv at gmail.com (James Kass) Date: Fri, 29 May 2020 09:14:59 +0000 Subject: QID Emoij In-Reply-To: References: <000001d632c3$249f75d0$6dde6170$@ewellic.org> <0B8B641E-A5A1-467B-BF19-C6FDF3FD5F0F@evertype.com> <29d231d7-6d5f-f840-e3d9-6b178e871feb@gmail.com> <74c6facd-1703-337d-2105-f959af00cd34@ix.netcom.com> <3fccb199-a5da-83ef-de23-62406f5cec60@gmail.com> <0840AF94-25F2-4682-B66C-803B741625B7@bahnhof.se> <5c735b52-3979-5d53-48a3-62f7db29cc17@gmail.com> Message-ID: <28b102fc-8b7f-9f36-fd9a-942e3eb868f5@gmail.com> On 2020-05-29 7:40 AM, David Starner wrote: > I don't see it. Profit driven corporate interests may or may not > support QID Emoji; if they don't, it's practically dead in the water. This discounts the probability that third-partiers would step up to the plate. > If they do, Google is going to make a list of corporately supported > emoji, just like what started this, and that's going to be the list of > supported QID emoji. Outside that corporate line, there's going to be > about zero chance anyone tries to use QID emoji, because at most one > in a million QID emoji are going to be supported, so even if you do > want to use an emoji from the Palmer's Chipmunk, what's the right QID > for that? It seems to have three, one for each scientific name, plus > maybe using the genus, Tamias, will be better supported, or Marmotini, > or Sciuridae, but if you're getting that vague, ... Aren't there far more than three ways to express the concept "Hello" using valid Unicode strings?? If *that* had been deemed an insurmountable obstacle, we'd still be limited to ASCII-English. >> Naysayers will point out obstacles in any such approach. In the English >> language, anyone who considers any and all obstacles insurmountable is >> referred to as a quitter. > People who use the term quitter in such a sense often get a ride from > a police car or ambulance later that night. It's a bizarre word to use > when you've decided Unicode should be a quitter and stop even trying > to manage emoji. > I've decided that Unicode has no business limiting an evolving set of symbols. From prosfilaes at gmail.com Fri May 29 05:32:31 2020 From: prosfilaes at gmail.com (David Starner) Date: Fri, 29 May 2020 03:32:31 -0700 Subject: QID Emoij In-Reply-To: <28b102fc-8b7f-9f36-fd9a-942e3eb868f5@gmail.com> References: <000001d632c3$249f75d0$6dde6170$@ewellic.org> <0B8B641E-A5A1-467B-BF19-C6FDF3FD5F0F@evertype.com> <29d231d7-6d5f-f840-e3d9-6b178e871feb@gmail.com> <74c6facd-1703-337d-2105-f959af00cd34@ix.netcom.com> <3fccb199-a5da-83ef-de23-62406f5cec60@gmail.com> <0840AF94-25F2-4682-B66C-803B741625B7@bahnhof.se> <5c735b52-3979-5d53-48a3-62f7db29cc17@gmail.com> <28b102fc-8b7f-9f36-fd9a-942e3eb868f5@gmail.com> Message-ID: On Fri, May 29, 2020 at 2:15 AM James Kass wrote: > On 2020-05-29 7:40 AM, David Starner wrote: > > I don't see it. Profit driven corporate interests may or may not > > support QID Emoji; if they don't, it's practically dead in the water. > > This discounts the probability that third-partiers would step up to the > plate. How? If you can't send it in email to arbitrary systems or in text messages in arbitrary systems and it show up right, who is going to use it? > > If they do, Google is going to make a list of corporately supported > > emoji, just like what started this, and that's going to be the list of > > supported QID emoji. Outside that corporate line, there's going to be > > about zero chance anyone tries to use QID emoji, because at most one > > in a million QID emoji are going to be supported, so even if you do > > want to use an emoji from the Palmer's Chipmunk, what's the right QID > > for that? It seems to have three, one for each scientific name, plus > > maybe using the genus, Tamias, will be better supported, or Marmotini, > > or Sciuridae, but if you're getting that vague, ... > > Aren't there far more than three ways to express the concept "Hello" > using valid Unicode strings? If *that* had been deemed an > insurmountable obstacle, we'd still be limited to ASCII-English. That's not exactly comparable. I'm looking for a way to pass an image of a Palmer's Chipmunk, and am willing to accept fallbacks. With QID emoji, there's no way for me to know what will work, nor any way for a implementer to know which one I will use. On the contrary, there is one correct way to express "Hello" in Unicode, as a series of five codepoints encoded in the Basic Latin block. Redundant encoding, where it's ambiguous which character to use, is frowned upon in Unicode, and most codepoints in Unicode are generally supported unless they're for a poorly supported script, in which case the problems can be anticipated. > >> Naysayers will point out obstacles in any such approach. In the English > >> language, anyone who considers any and all obstacles insurmountable is > >> referred to as a quitter. > > People who use the term quitter in such a sense often get a ride from > > a police car or ambulance later that night. It's a bizarre word to use > > when you've decided Unicode should be a quitter and stop even trying > > to manage emoji. > > > I've decided that Unicode has no business limiting an evolving set of > symbols. Why don't you do this yourself? You could have QID emoji codepoints in the PUA, and everyone would flock to supporting them. Any obstacles you point out in that just show that you're a quitter. -- The standard is written in English . If you have trouble understanding a particular section, read it again and again and again . . . Sit up straight. Eat your vegetables. Do not mumble. -- _Pascal_, ISO 7185 (1991) From jameskasskrv at gmail.com Fri May 29 06:56:07 2020 From: jameskasskrv at gmail.com (James Kass) Date: Fri, 29 May 2020 11:56:07 +0000 Subject: QID Emoij In-Reply-To: References: <000001d632c3$249f75d0$6dde6170$@ewellic.org> <0B8B641E-A5A1-467B-BF19-C6FDF3FD5F0F@evertype.com> <29d231d7-6d5f-f840-e3d9-6b178e871feb@gmail.com> <74c6facd-1703-337d-2105-f959af00cd34@ix.netcom.com> <3fccb199-a5da-83ef-de23-62406f5cec60@gmail.com> <0840AF94-25F2-4682-B66C-803B741625B7@bahnhof.se> <5c735b52-3979-5d53-48a3-62f7db29cc17@gmail.com> <28b102fc-8b7f-9f36-fd9a-942e3eb868f5@gmail.com> Message-ID: <13fd69f3-20ec-f36b-0806-30dad8f782f5@gmail.com> On 2020-05-29 10:32 AM, David Starner wrote: > On Fri, May 29, 2020 at 2:15 AM James Kass wrote: >> On 2020-05-29 7:40 AM, David Starner wrote: >>> I don't see it. Profit driven corporate interests may or may not >>> support QID Emoji; if they don't, it's practically dead in the water. >> This discounts the probability that third-partiers would step up to the >> plate. > How? If you can't send it in email to arbitrary systems or in text > messages in arbitrary systems and it show up right, who is going to > use it? > The same kind of people who used Unicode Indic when it wasn't supported; trailblazers, pioneers, and enthusiasts. Third-party folks offered scripts to convert to-and-from Indic Unicode and various pre-Unicode Indic font mappings.? Freely downloadable ones, at that.? An example: http://unicode.org/mail-arch/unicode-ml/Archives-Old/UML014/0753.html >> Aren't there far more than three ways to express the concept "Hello" >> using valid Unicode strings? If *that* had been deemed an >> insurmountable obstacle, we'd still be limited to ASCII-English. > That's not exactly comparable. I'm looking for a way to pass an image > of a Palmer's Chipmunk, and am willing to accept fallbacks. Always glad to help a brother out.? It can be done in-line, , if desired.? Here's the instructions I found on-line: Insert an Image Inline in an Email with Mozilla Thunderbird 1.Create a new message in Mozilla Thunderbird. 2.Put the cursor where you want the image to appear in the body of the email. 3.Select Insert > Image from the menu. 4.Use the Choose File... ... 5.Type a short textual description of the image under Alternate text: ... 6.Click OK. > With QID > emoji, there's no way for me to know what will work, nor any way for a > implementer to know which one I will use. On the contrary, there is > one correct way to express "Hello" in Unicode, as a series of five > codepoints encoded in the Basic Latin block. I was going for the concept of "hello" rather than the spelling of the English word for it.? But even in Basic Latin English, there's more than one way for the word. hello... Hello... HELLO! Even in Basic Latin English there's more than that for the concept. Hello, good day, good morning/evening/afternoon, howdy... Beyond English there's a myriad of ways. Bonjour, guten tag, ????????????, &c. >> I've decided that Unicode has no business limiting an evolving set of >> symbols. > Why don't you do this yourself? Because I don't have any business limiting an evolving set of symbols, either.? I'd rather go for a ride in a paddy-wagon. From richard.wordingham at ntlworld.com Fri May 29 09:39:17 2020 From: richard.wordingham at ntlworld.com (Richard Wordingham) Date: Fri, 29 May 2020 15:39:17 +0100 Subject: QID Emoij In-Reply-To: References: <000001d632c3$249f75d0$6dde6170$@ewellic.org> <0B8B641E-A5A1-467B-BF19-C6FDF3FD5F0F@evertype.com> <29d231d7-6d5f-f840-e3d9-6b178e871feb@gmail.com> <74c6facd-1703-337d-2105-f959af00cd34@ix.netcom.com> <3fccb199-a5da-83ef-de23-62406f5cec60@gmail.com> <0840AF94-25F2-4682-B66C-803B741625B7@bahnhof.se> <5c735b52-3979-5d53-48a3-62f7db29cc17@gmail.com> <28b102fc-8b7f-9f36-fd9a-942e3eb868f5@gmail.com> Message-ID: <20200529153917.6e7a94fb@JRWUBU2> On Fri, 29 May 2020 03:32:31 -0700 David Starner via Unicode wrote: > Why don't you do this yourself? You could have QID emoji codepoints in > the PUA, and everyone would flock to supporting them. Any obstacles > you point out in that just show that you're a quitter. But there are only a couple of planes in the PUA! There's one database (the Barcode of Life Data Systems) with over 190,000 species. Richard. From markus.icu at gmail.com Fri May 29 10:45:15 2020 From: markus.icu at gmail.com (Markus Scherer) Date: Fri, 29 May 2020 08:45:15 -0700 Subject: QID Emoji (from Re: Wireless Connection Symbol) In-Reply-To: <4da8599e-8e5d-49d7-9701-be8d227bfbfd@disroot.org> References: <4bddc04c.14fe.1725c7ae4cd.Webtop.41@btinternet.com> <4da8599e-8e5d-49d7-9701-be8d227bfbfd@disroot.org> Message-ID: On Fri, May 29, 2020 at 1:50 AM abrahamgross--- via Unicode < unicode at unicode.org> wrote: > What if instead of using a QID which on its own is just meaningless > numbers, we do tag_start, then write out a word in the tag letters, then > close it off with a tag_end. this way, if your device doesn't have the > font, it can fall back by rendering the tags through regular character > (with an indicator that an emoji is supposed to be there) > > Example: > Suppose I type: ?I love ?triceratops?? (with ? being tag_start; ? being > tag_end; and everything between being tag ascii characters E0020-E007E), > I think many people would not like the limitation to the ASCII repertoire which requires that the word is in English or a few other languages where all or some words can be reasonably spelled that way. Part of the attraction of symbols is also a lesser dependence on language. markus -------------- next part -------------- An HTML attachment was scrubbed... URL: From wjgo_10009 at btinternet.com Fri May 29 11:32:08 2020 From: wjgo_10009 at btinternet.com (wjgo_10009 at btinternet.com) Date: Fri, 29 May 2020 17:32:08 +0100 (BST) Subject: QID Emoji (from Re: Wireless Connection Symbol) Message-ID: <4a8a55.7fb.1726146cb6d.Webtop.229@btinternet.com> Re: QID Emoji (from Re: Wireless Connection Symbol) !123 Markus Scherer wrote as follows. > I think many people would not like the limitation to the ASCII > repertoire which requires that the word is in English or a few other > languages where all or some words can be reasonably spelled that way. > Part of the attraction of symbols is also a lesser dependence on > language. http://www.users.globalnet.co.uk/~ngo/locse027.pdf In another thread, (Re: QID Emoij), James Kass wrote as follows. > Even in Basic Latin English there's more than that for the concept. > Hello, good day, good morning/evening/afternoon, howdy... > Beyond English there's a myriad of ways. > Bonjour, guten tag, ????????????, &c. There is also, in my research project, !123 http://www.users.globalnet.co.uk/~ngo/A_List_of_Code_Numbers_and_English_Localizations_for_use_in_Research_on_Communication_through_the_Language_Barrier_using_encoded_Localizable_Sentences.pdf There are also other research documents and two novels, one novel completed in 2019, and a sequel, being published as chapters as each chapter is completed. They are available online, free to read, no registration needed, and the webspace is hosted on a PlusNet server, not on my computer. I upload over the internet. All of the documents and the novels are deposited for Legal Deposit with The British Library, and Legal Deposit receipted by The British Library. http://www.users.globalnet.co.uk/~ngo/ !987 William Overington Friday 29 May 2020 From asmusf at ix.netcom.com Fri May 29 12:52:46 2020 From: asmusf at ix.netcom.com (Asmus Freytag) Date: Fri, 29 May 2020 10:52:46 -0700 Subject: QID Emoij In-Reply-To: <5c735b52-3979-5d53-48a3-62f7db29cc17@gmail.com> References: <000001d632c3$249f75d0$6dde6170$@ewellic.org> <0B8B641E-A5A1-467B-BF19-C6FDF3FD5F0F@evertype.com> <29d231d7-6d5f-f840-e3d9-6b178e871feb@gmail.com> <74c6facd-1703-337d-2105-f959af00cd34@ix.netcom.com> <3fccb199-a5da-83ef-de23-62406f5cec60@gmail.com> <0840AF94-25F2-4682-B66C-803B741625B7@bahnhof.se> <5c735b52-3979-5d53-48a3-62f7db29cc17@gmail.com> Message-ID: <2ff76225-1311-4133-f293-5c06807e28cb@ix.netcom.com> An HTML attachment was scrubbed... URL: From asmusf at ix.netcom.com Fri May 29 12:53:46 2020 From: asmusf at ix.netcom.com (Asmus Freytag) Date: Fri, 29 May 2020 10:53:46 -0700 Subject: QID Emoij In-Reply-To: References: <000001d632c3$249f75d0$6dde6170$@ewellic.org> <0B8B641E-A5A1-467B-BF19-C6FDF3FD5F0F@evertype.com> <29d231d7-6d5f-f840-e3d9-6b178e871feb@gmail.com> <74c6facd-1703-337d-2105-f959af00cd34@ix.netcom.com> <3fccb199-a5da-83ef-de23-62406f5cec60@gmail.com> <0840AF94-25F2-4682-B66C-803B741625B7@bahnhof.se> <5c735b52-3979-5d53-48a3-62f7db29cc17@gmail.com> Message-ID: <810a5e1a-6bcb-7b4f-4d6a-1e7e452a8e0e@ix.netcom.com> An HTML attachment was scrubbed... URL: From kent.b.karlsson at bahnhof.se Sat May 30 15:30:46 2020 From: kent.b.karlsson at bahnhof.se (Kent Karlsson) Date: Sat, 30 May 2020 22:30:46 +0200 Subject: Wireless Connection Symbol In-Reply-To: References: <000001d632c3$249f75d0$6dde6170$@ewellic.org> <0B8B641E-A5A1-467B-BF19-C6FDF3FD5F0F@evertype.com> <29d231d7-6d5f-f840-e3d9-6b178e871feb@gmail.com> <74c6facd-1703-337d-2105-f959af00cd34@ix.netcom.com> <3fccb199-a5da-83ef-de23-62406f5cec60@gmail.com> <0840AF94-25F2-4682-B66C-803B741625B7@bahnhof.se> <3A410044-B4F7-4489-8CC4-3562D4E84ADE@bahnhof.se> Message-ID: <2C36D52C-77BA-488F-86A1-18C57088D12D@bahnhof.se> > 29 maj 2020 kl. 10:34 skrev Marius Spix via Unicode : > > What about using the icon URI scheme to represent arbitrary emoji? > > https://tools.ietf.org/id/draft-lafayette-icon-uri-scheme-00.html > > This would allow stuff like Archaeopteryx?/>

Thanks for the reference. A quick look gives that it seems to be a fair idea. But that proposal was for file type icons only (mostly based on file suffix). So ?icon:animals:dinosaurs:archaeopteryx? is not covered by that proposal (though ?icon:.pdf? is). It also has a number of quirks. And the proposal was ?dead in the water? according to the (presumed) author. But let?s assume a similar ?emoji icon? uri type. One would then need to have some reasonable, and agreed upon, ?universal?, way of referring to ?emoji icons?, to tell which one is referenced.

The displayed size (not the pixel sizes, of which there are at least two, the origin and the final display, the latter depends on various resize operations, including zoom) of the ?emoji icon? must be overridable by a style= Regards, > > Marius > > Gesendet: Freitag, 29. Mai 2020 um 00:19 Uhr > Von: "Kent Karlsson via Unicode" > An: "Markus Scherer" > Cc: "Unicode" > Betreff: Re: Wireless Connection Symbol > > > 28 maj 2020 kl. 04:53 skrev Markus Scherer via Unicode >: > > On Wed, May 27, 2020 at 6:20 PM Kent Karlsson via Unicode > wrote: > Granted, it is not plain text. But emoji are already pushing ?out of? plain text as we knew it. And? I recall an argument (years ago) saying essentially > ?these will be the only emoji encoded, the recommendation for expansion is to use images instead?. That seems to have been forgotten? > > Not entirely forgotten... > > http://www.unicode.org/reports/tr51/#Longer_Term > > markus > > Ok. Thanks for pointing that out. Glad it is not entirely forgotten. > > One little nit: > ?Other features required to make embedded graphics work well include the ability of images to scale with font size? > > That sounds a little bit like one was requiring a small revolution in image rendering. But of course it is not (ok, HTML again): > > ?? > > (1em being the typical height and width of emoji glyphs.) > > /Kent K > -------------- next part -------------- An HTML attachment was scrubbed... URL: From richard.wordingham at ntlworld.com Sun May 31 20:52:21 2020 From: richard.wordingham at ntlworld.com (Richard Wordingham) Date: Mon, 1 Jun 2020 02:52:21 +0100 Subject: Bengali script repha Message-ID: <20200601025221.50a434ce@JRWUBU2> Which consonants in the Bengali script may form repha? In particular, is the encoding just , or can it legitimately also be encoded as . The problem we are having is that some people are encoding Pali 'v' as U+09F0 RA WITH MIDDLE DIAGONAL, rather than U+09F1 BENGALI LETTER RA WITH LOWER DIAGONAL, and in some fonts (renderers?) the 'vy' cluster is being rendered with a repha as though it were . Should using the joining sequence prevent repha formation when the preceding character is U+09B0? TUS appears to only define the behaviour between RA and YA, not between RA and other letters. Richard.