From sosipiuk at gmail.com Mon Nov 13 17:12:44 2023 From: sosipiuk at gmail.com (=?UTF-8?Q?S=C5=82awomir_Osipiuk?=) Date: Mon, 13 Nov 2023 23:12:44 +0000 Subject: What does 'horizontal extension' mean in Unicode proposal docs? Message-ID: <1699916865426.240198284.3324546151@gmail.com> The phrase 'horizontal extension' appears in many proposals related to CJK characters, but isn't defined, and trying to intuit a definition from context is a bit risky. What does this phrase mean formally, and what would the obvious contrast 'vertical extension' mean? From kenwhistler at sonic.net Mon Nov 13 17:47:27 2023 From: kenwhistler at sonic.net (Ken Whistler) Date: Mon, 13 Nov 2023 15:47:27 -0800 Subject: What does 'horizontal extension' mean in Unicode proposal docs? In-Reply-To: <1699916865426.240198284.3324546151@gmail.com> References: <1699916865426.240198284.3324546151@gmail.com> Message-ID: Take a look at the CJK unified code charts on the site: https://www.unicode.org/charts/ The CJK unified ideographs are presented in a multi-column format, where each column represents a different source for the ideograph (and often shows a slightly different glyph appropriate, e.g., to a Chinese font versus a Japanese font, etc.). A *horizontal* extension for CJK represents the addition of a new *source* (and corresponding glyph) where one was not present before. If you look, for example, at the recently encoded CJK Extension I, all of the characters have a single (Chinese) source, because this repertoire was all added for recent additions in an important Chinese standard. However, in the future, if any of those characters also happens to be added to some Japanese, Taiwan, Korean, or other standard, a second source may be added for that character, to indicate its presence in something other than the original Chinese standard. That would be a horizontal extension. It wouldn't add a new CJK unified ideograph with a new code point -- instead, it would just add another source for an existing character. A *vertical* extension just means the addition of more CJK unified ideographs. Thus the encoding of CJK Extension I in Unicode 15.1 was a vertical extension, adding 622 more CJK unified ideographs to the standard. --Ken On 11/13/2023 3:12 PM, S?awomir Osipiuk via Unicode wrote: > The phrase 'horizontal extension' appears in many proposals related to > CJK characters, but isn't defined, and trying to intuit a definition > from context is a bit risky. > > What does this phrase mean formally, and what would the obvious > contrast 'vertical extension' mean? > From harjitmoe at outlook.com Mon Nov 13 17:47:50 2023 From: harjitmoe at outlook.com (Harriet Riddle) Date: Mon, 13 Nov 2023 23:47:50 +0000 Subject: What does 'horizontal extension' mean in Unicode proposal docs? In-Reply-To: <1699916865426.240198284.3324546151@gmail.com> Message-ID: An HTML attachment was scrubbed... URL: From jk at koremail.com Mon Nov 13 23:43:19 2023 From: jk at koremail.com (jk at koremail.com) Date: Tue, 14 Nov 2023 13:43:19 +0800 Subject: What does 'horizontal extension' mean in Unicode proposal docs? In-Reply-To: References: Message-ID: CJK Unified Ideographs are dealt with primarily by the IRG (ISO/IEC JTC1/SC2/WG2/IRG Ideographic Research Group) so their website is a good place to find out more https://appsrv.cse.cuhk.edu.hk/~irg/index.htm . On 2023-11-14 07:47, Harriet Riddle via Unicode wrote: > CJK Unified Ideograph blocks show more than one reference glyph for a > lot of the characters. > > Horizontal extension is adding an additional reference glyph to an > existing already-allocated codepoint. > > Vertical extension is allocating new codepoints. > > The references to directions make more sense when you think about the > distinctive layout of the code charts for the CJK Unified Ideograph > blocks. > > ?Har. > > On 13 Nov 2023 23:12, S?awomir Osipiuk via Unicode > wrote: > >> The phrase 'horizontal extension' appears in many proposals related >> to CJK >> characters, but isn't defined, and trying to intuit a definition >> from >> context is a bit risky. >> >> What does this phrase mean formally, and what would the obvious >> contrast >> 'vertical extension' mean? From andrew.richter at defence.gov.au Mon Nov 27 20:57:37 2023 From: andrew.richter at defence.gov.au (Richter, Andrew MR) Date: Tue, 28 Nov 2023 02:57:37 +0000 Subject: Why is (vv) -> (w) not amongst the Confusables? [SEC=OFFICIAL] Message-ID: <4b9545a8c87b4171ac09c038e60e0aac@defence.gov.au> OFFICIAL Hi Unicode ML, I'm trying to determine why "vv" (two of the letter "v") as a confusable for "w" (a single letter "w") is not included in the latest Confusables list whereas "rn" ("r" followed by "n") is included as a confusable for "m" (a single letter "m")? It looks like it was up to version 9.0 but was removed from version 10.0 onwards. IMPORTANT: This email remains the property of the Department of Defence. Unauthorised communication and dealing with the information in the email may be a serious criminal offence. If you have received this email in error, you are requested to contact the sender and delete the email immediately. -------------- next part -------------- An HTML attachment was scrubbed... URL: From piotrunio-2004 at wp.pl Tue Nov 28 12:38:07 2023 From: piotrunio-2004 at wp.pl (=?UTF-8?Q?piotrunio-2004=40wp=2Epl?=) Date: Tue, 28 Nov 2023 19:38:07 +0100 Subject: =?UTF-8?Q?Odp=3A_Why_is_=28vv=29_-=3E_=28w=29_not_amongst_the_Confusables=3F_?= In-Reply-To: <<4b9545a8c87b4171ac09c038e60e0aac@defence.gov.au>> References: <4b9545a8c87b4171ac09c038e60e0aac@defence.gov.au> Message-ID: <7b55639edf284be7bdce56c3d8c37395@grupawp.pl> The confusables property is rather subjective and not very commonly used and due to the very large size of Unicode character set it is therefore highly likely to be incomplete at all times and should not be exclusively relied on. For instance in my opinion U+23AE and U+2502 are identical glyphs in?virtually all typographically meaningful uses (especially due to the CP437/WGL4 heritage of U+2320 and U+2321 characters), and yet they're not linked in there. Since there is no objective criteria for what qualifies as a 'confusable' it doesn't seem appropriate to rely on that. I myself wouldn't like to rely on official normalization and case folding rules, let alone confusables. Dnia 28 listopada 2023 18:04 Richter, Andrew MR via Unicode < unicode at corp.unicode.org > napisa?(a): OFFICIAL Hi Unicode ML, ??????????????????????????????? I?m trying to determine why ?vv? (two of the letter ?v?) as a confusable for ?w? (a single letter ?w?) is not included in the latest Confusables list whereas ?rn? (?r? followed by ?n?) is included as a confusable for ?m? (a single letter ?m?)? It looks like it was up to version 9.0 but was removed from version 10.0 onwards. IMPORTANT: This email remains the property of the Department of Defence. Unauthorised communication and dealing with the information in the email may be a serious criminal offence. If you have received this email in error, you are requested to contact the sender and delete the email immediately. ? ? That is highly likely a fatal messaging error because this is a public mailing list. -------------- next part -------------- An HTML attachment was scrubbed... URL: From richard.wordingham at ntlworld.com Tue Nov 28 17:01:23 2023 From: richard.wordingham at ntlworld.com (Richard Wordingham) Date: Tue, 28 Nov 2023 23:01:23 +0000 Subject: Why is (vv) -> (w) not amongst the Confusables? [SEC=OFFICIAL] In-Reply-To: <4b9545a8c87b4171ac09c038e60e0aac@defence.gov.au> References: <4b9545a8c87b4171ac09c038e60e0aac@defence.gov.au> Message-ID: <20231128230123.73b07a01@JRWUBU2> On Tue, 28 Nov 2023 02:57:37 +0000 "Richter, Andrew MR via Unicode" wrote: > IMPORTANT: This email remains the property of > the Department of Defence. Unauthorised communication and dealing > with the information in the email may be a serious criminal offence. > If you have received this email in error, you are requested to > contact the sender and delete the email immediately. You should really prefix this warning with a statement that the communication is strictly limited to all entities within 100 light years of Earth. Richard. From doug at ewellic.org Tue Nov 28 17:31:06 2023 From: doug at ewellic.org (Doug Ewell) Date: Tue, 28 Nov 2023 23:31:06 +0000 Subject: Why is (vv) -> (w) not amongst the Confusables? [SEC=OFFICIAL] In-Reply-To: <20231128230123.73b07a01@JRWUBU2> References: <4b9545a8c87b4171ac09c038e60e0aac@defence.gov.au> <20231128230123.73b07a01@JRWUBU2> Message-ID: Richard Wordingham wrote: > On Tue, 28 Nov 2023 02:57:37 +0000 > "Richter, Andrew MR via Unicode" wrote: > >> IMPORTANT: This email remains the property of the Department of >> Defence. Unauthorised communication and dealing with the information >> in the email may be a serious criminal offence. >> If you have received this email in error, you are requested to >> contact the sender and delete the email immediately. > > You should really prefix this warning with a statement that the > communication is strictly limited to all entities within 100 light > years of Earth. Indeed, I had already deleted the original message, and would not have responded to the underlying question about Unicode even if I had the answer. Broad legal threats from military agencies are not to be dismissed lightly. -- Doug Ewell, CC, ALB | Lakewood, CO, US | ewellic.org From cate at cateee.net Wed Nov 29 01:40:08 2023 From: cate at cateee.net (Giacomo Catenazzi) Date: Wed, 29 Nov 2023 08:40:08 +0100 Subject: Odp: Why is (vv) -> (w) not amongst the Confusables? In-Reply-To: <7b55639edf284be7bdce56c3d8c37395@grupawp.pl> References: <4b9545a8c87b4171ac09c038e60e0aac@defence.gov.au> <7b55639edf284be7bdce56c3d8c37395@grupawp.pl> Message-ID: <4436e859-f205-4cac-91dc-1962a1f4f088@cateee.net> Confusable is a very important property for security (phishing, scamming, etc.), and it is important also because it is used by non "Unicode" people. Other properties of Unicode may requires some knowledge of Unicode standard/book and so scripts and language difference, but confusable should be usable by people without deep knowledge of e.g. Cyrillic or Indic scripts. In any case, I think we should be careful on adding confusable on character on the same "scripts": OK where the glyph is or was the same (zero and O, one and el in typewriter, ev. I and el), or when language mix with scripts (Turkish the I without dot is not the uppercase i). Else the font/script designer should make things readable. If we can confuse w and vv, I think we must change font. Note: cursive writings are exceptions: in cursive many characters are confusable. PS: was W3C maintaining the confusable list? cate On 28 Nov 2023 19:38, piotrunio-2004 at wp.pl via Unicode wrote: > The confusables property is rather subjective and not very commonly used > and due to the very large size of Unicode character set it is therefore > highly likely to be incomplete at all times and should not be > exclusively relied on. For instance in my opinion U+23AE and U+2502 are > identical glyphs in?virtually all typographically meaningful uses > (especially due to the CP437/WGL4 heritage of U+2320 and U+2321 > characters), and yet they're not linked in there. Since there is no > objective criteria for what qualifies as a 'confusable' it doesn't seem > appropriate to rely on that. I myself wouldn't like to rely on official > normalization and case folding rules, let alone confusables. > > > Dnia 28 listopada 2023 18:04 Richter, Andrew MR via Unicode > > napisa?(a): > > > *OFFICIAL* > > > > Hi Unicode ML, > > > > ??????????????????????????????? I?m trying to determine why ?vv? > (two of the letter ?v?) as a confusable for ?w? (a single letter > ?w?) is not included in the latest Confusables list whereas ?rn? > (?r? followed by ?n?) is included as a confusable for ?m? (a single > letter ?m?)? It looks like it was up to version 9.0 but was removed > from version 10.0 onwards. > > > > IMPORTANT: This email remains the property of the Department of > Defence. Unauthorised communication and dealing with the information > in the email may be a serious criminal offence. If you have received > this email in error, you are requested to contact the sender and > delete the email immediately. > > > > > > > > > > That is highly likely a fatal messaging error because this is a public > mailing list. From beckiergb at gmail.com Wed Nov 29 02:30:06 2023 From: beckiergb at gmail.com (Rebecca Bettencourt) Date: Wed, 29 Nov 2023 00:30:06 -0800 Subject: Odp: Why is (vv) -> (w) not amongst the Confusables? In-Reply-To: <4436e859-f205-4cac-91dc-1962a1f4f088@cateee.net> References: <4b9545a8c87b4171ac09c038e60e0aac@defence.gov.au> <7b55639edf284be7bdce56c3d8c37395@grupawp.pl> <4436e859-f205-4cac-91dc-1962a1f4f088@cateee.net> Message-ID: On Tue, Nov 28, 2023 at 11:44?PM Giacomo Catenazzi via Unicode < unicode at corp.unicode.org> wrote: > Confusable is a very important property for security (phishing, > scamming, etc.), and it is important also because it is used by non > "Unicode" people. Other properties of Unicode may requires some > knowledge of Unicode standard/book and so scripts and language > difference, but confusable should be usable by people without deep > knowledge of e.g. Cyrillic or Indic scripts. Back in the day I witnessed so many schoolmates being phished by r n y space login . com. So much that I'm almost convinced it's what led to r n ? m being added to the confusables list in the first place. -------------- next part -------------- An HTML attachment was scrubbed... URL: From cate at cateee.net Wed Nov 29 03:01:52 2023 From: cate at cateee.net (Giacomo Catenazzi) Date: Wed, 29 Nov 2023 10:01:52 +0100 Subject: Odp: Why is (vv) -> (w) not amongst the Confusables? In-Reply-To: References: <4b9545a8c87b4171ac09c038e60e0aac@defence.gov.au> <7b55639edf284be7bdce56c3d8c37395@grupawp.pl> <4436e859-f205-4cac-91dc-1962a1f4f088@cateee.net> Message-ID: <4ca96010-2e5f-4806-b0e4-ea8787b6afa0@cateee.net> On 29 Nov 2023 09:30, Rebecca Bettencourt via Unicode wrote: > On Tue, Nov 28, 2023 at 11:44?PM Giacomo Catenazzi via Unicode > > wrote: > > Confusable is a very important property for security (phishing, > scamming, etc.), and it is important also because it is used by non > "Unicode" people. Other properties of Unicode may requires some > knowledge of Unicode standard/book and so scripts and language > difference, but confusable should be usable by people without deep > knowledge of e.g. Cyrillic or Indic scripts. > > > Back in the day I witnessed so many schoolmates being phished by r n y > space login . com. So much that I'm almost convinced it's what led to r > n ? m being added to the confusables list in the first?place. Right. I needed to look for it. Comic Sans doesn't have such problem (one positive point for it), but many other of Microsoft fonts of that time have such problem, especially if we precede it with the slashes: "//rny". But looking on Google Fonts site, it seems now it is not more a problem. Do the font designers have some list of such tricky sequences? cate From duerst at it.aoyama.ac.jp Wed Nov 29 22:01:56 2023 From: duerst at it.aoyama.ac.jp (=?UTF-8?Q?Martin_J=2E_D=C3=BCrst?=) Date: Thu, 30 Nov 2023 13:01:56 +0900 Subject: Odp: Why is (vv) -> (w) not amongst the Confusables? In-Reply-To: References: <4b9545a8c87b4171ac09c038e60e0aac@defence.gov.au> <7b55639edf284be7bdce56c3d8c37395@grupawp.pl> <4436e859-f205-4cac-91dc-1962a1f4f088@cateee.net> Message-ID: <3d498ff0-4875-4ad7-8ba5-b1cc1f759dc3@it.aoyama.ac.jp> On 2023-11-29 17:30, Rebecca Bettencourt via Unicode wrote: > Back in the day I witnessed so many schoolmates being phished by r n y > space login . com. So much that I'm almost convinced it's what led to r n ? > m being added to the confusables list in the first place. > If not that, then the randornhouse.com incident (for details, see https://boingboing.net/2022/01/07/the-strange-case-of-the-tom-ripley-of-book-publishing.html). And while 'good' fonts help, not all reader's eyes are always as sharp as we would like them to be (this alludes both to physical (i.e. optical) vision and psychological aspects (seeing what you expect to see,...)). Regards, Martin.