From unicode at unicode.org Thu Nov 2 11:11:07 2017 From: unicode at unicode.org (Rostislav via Unicode) Date: Thu, 02 Nov 2017 19:11:07 +0300 Subject: A criteria for Emoji property assignment? In-Reply-To: References: Message-ID: <24621509639067@web11g.yandex.ru> I wonder what reason lies behind Unicode Consortium?s declaring some decorative characters as emojis while leaving some other in the state of regular characters. For example: 1. Four arrows (????, 2190?2193) are not emojis, while the four diagonal arrows in the same Unicode block (????, 2196?2199) are emojis. 2. 23F9 (?) and 23FA (?) are emojis, but the next two characters 23FB (?) and 23FC (?) aren?t. 3. In the Geometric Shapes block, only two characters (25AA ? and 25AB ?) are considered emojis, while other 94 aren?t. While did just these two little squares deserve the honor of bearing Emoji property, in contrast to all other geometric shapes? 4. In the Miscellaneous Symbols block, there is a suspicion that the characters were appointed emojis randomly. Two snowmen (2603 ? and 26C4 ?) are emojis, but the third one (26C7 ?) is not; the up-pointing finger (261D ?) is an emoji, the down-pointing one (261F ?) is not: a cloud without rain (2601 ?) and with rain (26C8 ?) are emojis, but a rain without cloud (26C6 ?) isn?t. Of the characters originated from the single source (namely ARIB, L2/07-391), some became emojis, some not?without any apparent logic. 5. More strange, on the first page of Miscellaneous Symbols and Pictographs (1F300?1F3FF) almost all characters are emojis, except for 10 that are gnawed out inexplicably (e.g. 1F395 ?? and 1F3F2 ??). A similar situation is in the Supplemental Symbols and Pictographs block, where a rifle (1F946 ??) is excluded from emojis, though almost all other characters have Emoji property. On the whole, almost every Unicode emoji raises a question, why some or many other similar characters aren?t emojis like this one; and lots of non-emojis also rise questions why they aren?t. The assignment of Emoji property to characters seems to be inconsistent, arbitrary and unexplainable, Or is there an unified explanation of criteria for Emoji property assignment? -- From unicode at unicode.org Thu Nov 2 11:39:38 2017 From: unicode at unicode.org (Rick McGowan via Unicode) Date: Thu, 02 Nov 2017 09:39:38 -0700 Subject: Emoji candidate chart update Message-ID: <59FB4A4A.9020406@unicode.org> Hi Everyone, Just FYI... The new Unicode emoji candidate charts, with updates from the UTC #153 meeting are now posted at: http://www.unicode.org/emoji/future/emoji-candidates.html R From unicode at unicode.org Thu Nov 2 11:52:01 2017 From: unicode at unicode.org (Asmus Freytag via Unicode) Date: Thu, 2 Nov 2017 09:52:01 -0700 Subject: A criteria for Emoji property assignment? In-Reply-To: <24621509639067@web11g.yandex.ru> References: <24621509639067@web11g.yandex.ru> Message-ID: <21c5bccf-680b-92e1-0970-aa8b1886571c@ix.netcom.com> An HTML attachment was scrubbed... URL: From unicode at unicode.org Fri Nov 3 04:13:50 2017 From: unicode at unicode.org (Andre Schappo via Unicode) Date: Fri, 3 Nov 2017 09:13:50 +0000 Subject: ASCII v Unicode Message-ID: <3E1A532E-9E80-46F4-92FB-15B838BC1D84@lboro.ac.uk> You may find https://twitter.com/andreschappo/status/926163719331176450 amusing ?? Andr? Schappo -------------- next part -------------- An HTML attachment was scrubbed... URL: From unicode at unicode.org Fri Nov 3 04:36:43 2017 From: unicode at unicode.org (Asmus Freytag via Unicode) Date: Fri, 3 Nov 2017 02:36:43 -0700 Subject: ASCII v Unicode In-Reply-To: <3E1A532E-9E80-46F4-92FB-15B838BC1D84@lboro.ac.uk> References: <3E1A532E-9E80-46F4-92FB-15B838BC1D84@lboro.ac.uk> Message-ID: <62ebc768-7cfa-3bd9-53bc-601a52d76f60@ix.netcom.com> An HTML attachment was scrubbed... URL: From unicode at unicode.org Fri Nov 3 06:29:31 2017 From: unicode at unicode.org (Andre Schappo via Unicode) Date: Fri, 3 Nov 2017 11:29:31 +0000 Subject: ASCII v Unicode In-Reply-To: <62ebc768-7cfa-3bd9-53bc-601a52d76f60@ix.netcom.com> References: <3E1A532E-9E80-46F4-92FB-15B838BC1D84@lboro.ac.uk> <62ebc768-7cfa-3bd9-53bc-601a52d76f60@ix.netcom.com> Message-ID: On 3 Nov 2017, at 09:36, Asmus Freytag via Unicode > wrote: On 11/3/2017 2:13 AM, Andre Schappo via Unicode wrote: You may find https://twitter.com/andreschappo/status/926163719331176450 amusing ?? Andr? Schappo You're wildly off in your page count. The "book" part of Unicode (Core Specification) alone is 1,500 pages. I haven't looked at the single file code charts in a while, but I believe you get at least that number again. Then add the dozen or so "Annexes" for a few hundred additional pages and be happy that nobody prints the Unicode Character Database (or the Unihan Database for that matter). A./ Yes, I agree, my page count is much lower than it should be for Unicode, if I was being literal. I was being figurative rather than literal. I was just making a point to the ASCII developers/programmers and ASCII Academics ?? Prior to tweeting I did consider other numbers. My considerations included 1000, 5000 and 10000. But in my mind "Unicode is a 500 page book" seemed to flow better. I don't know why. Actually, it probably for the best that I wrote "500 page" because otherwise ASCII developers/programmers and ASCII Academics would not even start reading the Unicode book if they thought it was (say) 5000 pages long. Let's now look at it literally and here is a template "Unicode is a X page book". My guess would be "Unicode is a 10000+ page book" Anyone care to estimate X? Andr? Schappo -------------- next part -------------- An HTML attachment was scrubbed... URL: From unicode at unicode.org Fri Nov 3 07:50:10 2017 From: unicode at unicode.org (Phake Nick via Unicode) Date: Fri, 3 Nov 2017 20:50:10 +0800 Subject: ASCII v Unicode In-Reply-To: References: <3E1A532E-9E80-46F4-92FB-15B838BC1D84@lboro.ac.uk> <62ebc768-7cfa-3bd9-53bc-601a52d76f60@ix.netcom.com> Message-ID: The entire Unicode can also be printed onto a single page if you use a very huge paper coupled with smaller font size! ?I think a football field sized paper could possibly do the job? 2017-11-03 19:29 GMT+08:00 Andre Schappo via Unicode : > > On 3 Nov 2017, at 09:36, Asmus Freytag via Unicode > wrote: > > On 11/3/2017 2:13 AM, Andre Schappo via Unicode wrote: > > > You may find https://twitter.com/andreschappo/status/926163719331176450 amusing > ?? > > Andr? Schappo > > You're wildly off in your page count. > > The "book" part of Unicode (Core Specification) alone is 1,500 pages. I > haven't looked at the single file code charts in a while, but I believe you > get at least that number again. Then add the dozen or so "Annexes" for a > few hundred additional pages and be happy that nobody prints the Unicode > Character Database (or the Unihan Database for that matter). > > A./ > > > Yes, I agree, my page count is much lower than it should be for Unicode, > if I was being literal. I was being figurative rather than literal. I was > just making a point to the ASCII developers/programmers and ASCII Academics > ?? > > Prior to tweeting I did consider other numbers. My considerations included > 1000, 5000 and 10000. But in my mind "Unicode is a 500 page book" seemed to > flow better. I don't know why. > > Actually, it probably for the best that I wrote "500 page" because > otherwise ASCII developers/programmers and ASCII Academics would not even > start reading the Unicode book if they thought it was (say) 5000 pages long. > > Let's now look at it literally and here is a template "Unicode is a X page > book". > > My guess would be "Unicode is a 10000+ page book" > > Anyone care to estimate X? > > Andr? Schappo > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From unicode at unicode.org Fri Nov 3 08:44:46 2017 From: unicode at unicode.org (Andre Schappo via Unicode) Date: Fri, 3 Nov 2017 13:44:46 +0000 Subject: ASCII v Unicode In-Reply-To: References: <3E1A532E-9E80-46F4-92FB-15B838BC1D84@lboro.ac.uk> <62ebc768-7cfa-3bd9-53bc-601a52d76f60@ix.netcom.com> Message-ID: <165C5859-D374-45F6-B882-E887515885FD@lboro.ac.uk> hmmmm.... I think the only way we can resolve this "X page Unicode book" issue is to recruit an infinite number of monkeys ??????????????????????????????? Andr? Schappo On 3 Nov 2017, at 12:50, Phake Nick > wrote: The entire Unicode can also be printed onto a single page if you use a very huge paper coupled with smaller font size! ?I think a football field sized paper could possibly do the job? 2017-11-03 19:29 GMT+08:00 Andre Schappo via Unicode >: On 3 Nov 2017, at 09:36, Asmus Freytag via Unicode > wrote: On 11/3/2017 2:13 AM, Andre Schappo via Unicode wrote: You may find https://twitter.com/andreschappo/status/926163719331176450 amusing ?? Andr? Schappo You're wildly off in your page count. The "book" part of Unicode (Core Specification) alone is 1,500 pages. I haven't looked at the single file code charts in a while, but I believe you get at least that number again. Then add the dozen or so "Annexes" for a few hundred additional pages and be happy that nobody prints the Unicode Character Database (or the Unihan Database for that matter). A./ Yes, I agree, my page count is much lower than it should be for Unicode, if I was being literal. I was being figurative rather than literal. I was just making a point to the ASCII developers/programmers and ASCII Academics ?? Prior to tweeting I did consider other numbers. My considerations included 1000, 5000 and 10000. But in my mind "Unicode is a 500 page book" seemed to flow better. I don't know why. Actually, it probably for the best that I wrote "500 page" because otherwise ASCII developers/programmers and ASCII Academics would not even start reading the Unicode book if they thought it was (say) 5000 pages long. Let's now look at it literally and here is a template "Unicode is a X page book". My guess would be "Unicode is a 10000+ page book" Anyone care to estimate X? Andr? Schappo ?? ?? ?? Andr? Schappo https://schappo.blogspot.co.uk https://twitter.com/andreschappo https://weibo.com/andreschappo https://groups.google.com/forum/#!forum/computer-science-curriculum-internationalization -------------- next part -------------- An HTML attachment was scrubbed... URL: From unicode at unicode.org Fri Nov 3 11:23:04 2017 From: unicode at unicode.org (Asmus Freytag (c) via Unicode) Date: Fri, 3 Nov 2017 09:23:04 -0700 Subject: ASCII v Unicode In-Reply-To: <31535187.44475.1509725560242.JavaMail.defaultUser@defaultHost> References: <10738465.40177.1509722360750.JavaMail.root@webmail11.bt.ext.cpcloud.co.uk> <31535187.44475.1509725560242.JavaMail.defaultUser@defaultHost> Message-ID: <2fac7b41-4fa0-2f3a-c403-8b83586c87bf@ix.netcom.com> On 11/3/2017 9:12 AM, William_J_G Overington wrote: > GS1-128 barcode technology is being introduced into National Health Service hospitals in the United Kingdom. This is so off-topic and unrelated to the discussion. A./ > > http://www.scan4safety.nhs.uk/ > > As barcode scanners will be in use, a not unrealistic scenario is that localizable sentences encoded in GS1-128 barcodes could be used for some everyday communication through the language barrier. > > For example, a whole sentence, such as, here localized into English, > > Would you like a drink of water? > > could be encoded as > > ::781:; > > within Application Identifier 97 of a GS1-128 barcode. > > Suppose that this system were being implemented. > > For localization into English, the sentence.dat text file could contain the following line of text for localizing that particuar localizable sentence. > > ::781:;|Would you like a drink of water? > > If the sentence.dat file and the software to handle it were implemented in 7-bit ASCII the system would work fine for localization into English. > > If many sentence.dat files, one for each language, and the software to handle them were implemented in 8-bit ASCII the system would work fine for localization into English and for localization into many of the languages of Western Europe and Scandinavia. > > If many sentence.dat files, one for each language, and the software to handle them were implemented in Unicode using the UTF-16 text file format for each sentence.dat file, the system would work fine for localization into many languages of the world. > > This seems to me to be a very good example of why Unicode is so much better than ASCII. > > William Overington > > Friday 3 November 2017 > From unicode at unicode.org Sat Nov 4 07:04:03 2017 From: unicode at unicode.org (Andre Schappo via Unicode) Date: Sat, 4 Nov 2017 12:04:03 +0000 Subject: ASCII v Unicode In-Reply-To: References: <3E1A532E-9E80-46F4-92FB-15B838BC1D84@lboro.ac.uk> <62ebc768-7cfa-3bd9-53bc-601a52d76f60@ix.netcom.com> <59FC8B49.8040004@unicode.org> Message-ID: We now have a literal number for ASCII which is 31 pages https://twitter.com/srl295/status/926530928171671552 Andr? Schappo On 3 Nov 2017, at 15:45, Asmus Freytag (c) > wrote: On 11/3/2017 8:29 AM, Rick McGowan wrote: The 10.0 chart PDF is 2570 pages. On 11/3/2017 2:36 AM, Asmus Freytag via Unicode wrote: single file code charts in a while, but I believe you get at least that number again. PS: @Andre: update to my last message: 1,500 Core, 2570+ Charts, and, say 430, for the UAXs would make 4,500 pages. Off by a factor 9 from your initial value, but not quite "zillions". :) ?? ?? ?? Andr? Schappo https://schappo.blogspot.co.uk https://twitter.com/andreschappo https://weibo.com/andreschappo https://groups.google.com/forum/#!forum/computer-science-curriculum-internationalization -------------- next part -------------- An HTML attachment was scrubbed... URL: From unicode at unicode.org Sun Nov 5 23:55:53 2017 From: unicode at unicode.org (=?UTF-8?B?TWFyayBEYXZpcyDimJXvuI8=?= via Unicode) Date: Sun, 5 Nov 2017 21:55:53 -0800 Subject: ASCII v Unicode In-Reply-To: <62ebc768-7cfa-3bd9-53bc-601a52d76f60@ix.netcom.com> References: <3E1A532E-9E80-46F4-92FB-15B838BC1D84@lboro.ac.uk> <62ebc768-7cfa-3bd9-53bc-601a52d76f60@ix.netcom.com> Message-ID: I had some time on the plane this weekend, and generated some more comprehensive figures that take the following into account: 1. There are two senses of "Unicode". In the narrow sense, it is only the Unicode Standard (ie, Unicode Characters). But it has grown to have a more comprehensive sense, including the other two main projects of the Unicode Consortium: Unicode CLDR and ICU. 2. The ca. 3,300 pages that Asmus cited include specification *text* alone, but *data/code* (eg, UCD property data, or source code for ICU) is a vital part of the projects. I thus generated a rough comparison where I (a) included CLDR and ICU, and (b) included data. That gave the following results (where "encoding" includes both the Unicode Standard *and* UTS's that are aligned with it in version, including emoji ? since that is to be aligned with it). [image: Inline image 1] *Caveats* - *This is a rough approximation (my flight wasn't all that long...).* In particular, don't count on the 3 decimals of precision ? that is just the spreadsheet charting. - For the data files and code files, I filtered by removing # comments, collapsing sequences of whitespace into a single space character, trimming whitespace, and tossing empty lines. I then counted a page as a total of 3K code points. So the page count for data and code is far smaller than simply a line count. (Didn't bother dropping // and /*...*/ comments in code.) I also excluded .txt files that had the word "test" (case-insensitive) in their names. - For html pages I took a few samples of PDFs for UTS's and ICU docs, and got a count of HTML code points per page for each generated type of page, then divided out to get an approximate page count. - There were some other filters: for example, for ICU sources I included only files of type {"cpp", "c", "h", "ucm", "java"}, since files of type "txt" were likely generated from CLDR data. For CLDR I excluded charts and Survey Tool pages, since that would have bulked up the CLDR pie-slice drammatically. - (And by the way, the pie-slice for emoji is not visible in this graph: just 0.1%.) Mark On Fri, Nov 3, 2017 at 2:36 AM, Asmus Freytag via Unicode < unicode at unicode.org> wrote: > On 11/3/2017 2:13 AM, Andre Schappo via Unicode wrote: > > > You may find https://twitter.com/andreschappo/status/926163719331176450 amusing > ?? > > Andr? Schappo > > You're wildly off in your page count. > > The "book" part of Unicode (Core Specification) alone is 1,500 pages. I > haven't looked at the single file code charts in a while, but I believe you > get at least that number again. Then add the dozen or so "Annexes" for a > few hundred additional pages and be happy that nobody prints the Unicode > Character Database (or the Unihan Database for that matter). > > A./ > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Screen Shot 2017-11-05 at 23.41.47.png Type: image/png Size: 57301 bytes Desc: not available URL: From unicode at unicode.org Tue Nov 7 01:33:07 2017 From: unicode at unicode.org (Sudhanwa Jogalekar via Unicode) Date: Tue, 7 Nov 2017 13:03:07 +0530 Subject: ASCII v Unicode In-Reply-To: References: <3E1A532E-9E80-46F4-92FB-15B838BC1D84@lboro.ac.uk> <62ebc768-7cfa-3bd9-53bc-601a52d76f60@ix.netcom.com> Message-ID: Lets create another Annexe for standardising " Printing of Unicode standard - "usage of fonts, paper size " etc etc.... ;-) LOL !! On Fri, Nov 3, 2017 at 6:20 PM, Phake Nick via Unicode wrote: > The entire Unicode can also be printed onto a single page if you use a > very huge paper coupled with smaller font size! ?I think a football field > sized paper could possibly do the job? > > 2017-11-03 19:29 GMT+08:00 Andre Schappo via Unicode > : > >> >> On 3 Nov 2017, at 09:36, Asmus Freytag via Unicode >> wrote: >> >> On 11/3/2017 2:13 AM, Andre Schappo via Unicode wrote: >> >> >> You may find https://twitter.com/andreschappo/status/926163719331176450 amusing >> ?? >> >> Andr? Schappo >> >> You're wildly off in your page count. >> >> The "book" part of Unicode (Core Specification) alone is 1,500 pages. I >> haven't looked at the single file code charts in a while, but I believe you >> get at least that number again. Then add the dozen or so "Annexes" for a >> few hundred additional pages and be happy that nobody prints the Unicode >> Character Database (or the Unihan Database for that matter). >> >> A./ >> >> >> Yes, I agree, my page count is much lower than it should be for Unicode, >> if I was being literal. I was being figurative rather than literal. I was >> just making a point to the ASCII developers/programmers and ASCII Academics >> ?? >> >> Prior to tweeting I did consider other numbers. My considerations >> included 1000, 5000 and 10000. But in my mind "Unicode is a 500 page book" >> seemed to flow better. I don't know why. >> >> Actually, it probably for the best that I wrote "500 page" because >> otherwise ASCII developers/programmers and ASCII Academics would not even >> start reading the Unicode book if they thought it was (say) 5000 pages long. >> >> Let's now look at it literally and here is a template "Unicode is a X >> page book". >> >> My guess would be "Unicode is a 10000+ page book" >> >> Anyone care to estimate X? >> >> Andr? Schappo >> >> >> > -- ~!~!~!~!~!~!~!~!~!~!~!~!~!~!~!~!~!~!~!~!~!~!~!~!~! web: www.sudhanwa.com blog: www.sudhanwa.in Twitter: sudhanwa Check on FB, Linkedin for more. -------------- next part -------------- An HTML attachment was scrubbed... URL: From unicode at unicode.org Thu Nov 9 02:47:28 2017 From: unicode at unicode.org (=?UTF-8?Q?Elias_M=C3=A5rtenson?= via Unicode) Date: Thu, 9 Nov 2017 16:47:28 +0800 Subject: Aw: Re: LATIN CAPITAL LETTER SHARP S officially recognized In-Reply-To: <20170703.184946.1082299263384367210.wl@gnu.org> References: <6063F267-BF9A-42E3-B679-E48BACE47541@alastairs-place.net> <20170703.184946.1082299263384367210.wl@gnu.org> Message-ID: On 4 July 2017 at 00:49, Werner LEMBERG via Unicode wrote: > > > No, the hyphenation oddity involving the addition of letters with > > hyphenation (or, to be more precise, to suppress letters in > > unhyphenated words) never affected the letter s. > > I'm not sure that this is really true. As far as I know, `sss' in > Swiss German was handled similar to other triplet consonants before > the 1996 spelling reform. In other words, you would have written > > Abschlussatz (`closing sentence') > > instead of > > Abschlusssatz , > > and which would have been hyphenated as > > Abschluss-satz > This is still the case for Swedish though. I studied German before 1996, and I was under the impression that the rules in this case wad identical for Swedish and German. What do the rules say now? Regards, Elias -------------- next part -------------- An HTML attachment was scrubbed... URL: From unicode at unicode.org Thu Nov 9 04:12:25 2017 From: unicode at unicode.org (Walter Tross via Unicode) Date: Thu, 9 Nov 2017 11:12:25 +0100 Subject: Aw: Re: LATIN CAPITAL LETTER SHARP S officially recognized In-Reply-To: References: <6063F267-BF9A-42E3-B679-E48BACE47541@alastairs-place.net> <20170703.184946.1082299263384367210.wl@gnu.org> Message-ID: Long story short: it's Abschlusssatz now (and Rollladen, etc.) One of the criteria of the reform was to normalise hyphenation. This has gone so far as to hyphenate B?-cker, with the additional criterion of keeping the c inside its group. 2017-11-09 9:47 GMT+01:00 Elias M?rtenson via Unicode : > On 4 July 2017 at 00:49, Werner LEMBERG via Unicode > wrote: > >> >> > No, the hyphenation oddity involving the addition of letters with >> > hyphenation (or, to be more precise, to suppress letters in >> > unhyphenated words) never affected the letter s. >> >> I'm not sure that this is really true. As far as I know, `sss' in >> Swiss German was handled similar to other triplet consonants before >> the 1996 spelling reform. In other words, you would have written >> >> Abschlussatz (`closing sentence') >> >> instead of >> >> Abschlusssatz , >> >> and which would have been hyphenated as >> >> Abschluss-satz >> > > This is still the case for Swedish though. I studied German before 1996, > and I was under the impression that the rules in this case wad identical > for Swedish and German. What do the rules say now? > > Regards, > Elias > -------------- next part -------------- An HTML attachment was scrubbed... URL: From unicode at unicode.org Thu Nov 9 20:40:19 2017 From: unicode at unicode.org (=?UTF-8?Q?Elias_M=C3=A5rtenson?= via Unicode) Date: Fri, 10 Nov 2017 10:40:19 +0800 Subject: Aw: Re: LATIN CAPITAL LETTER SHARP S officially recognized In-Reply-To: References: <6063F267-BF9A-42E3-B679-E48BACE47541@alastairs-place.net> <20170703.184946.1082299263384367210.wl@gnu.org> Message-ID: On 9 November 2017 at 18:12, Walter Tross wrote: > Long story short: it's Abschlusssatz now (and Rollladen, etc.) One of the > criteria of the reform was to normalise hyphenation. This has gone so far > as to hyphenate B?-cker, with the additional criterion of keeping the c > inside its group. > Wow. That looks incredibly strange to me. Thanks for informing me of this change, I would probably have thought it to be a typo if I saw that written. As for B?cker, I presume the previous hyphenation was B?ck-er? (at least that's how it would be written in Swedish). Is this still allowed? I.e. are the hyphenation points B?-ck-er? Regards, Elias -------------- next part -------------- An HTML attachment was scrubbed... URL: From unicode at unicode.org Thu Nov 9 21:11:17 2017 From: unicode at unicode.org (Asmus Freytag via Unicode) Date: Thu, 9 Nov 2017 19:11:17 -0800 Subject: Aw: Re: LATIN CAPITAL LETTER SHARP S officially recognized In-Reply-To: References: <6063F267-BF9A-42E3-B679-E48BACE47541@alastairs-place.net> <20170703.184946.1082299263384367210.wl@gnu.org> Message-ID: An HTML attachment was scrubbed... URL: From unicode at unicode.org Thu Nov 9 21:25:47 2017 From: unicode at unicode.org (Philippe Verdy via Unicode) Date: Fri, 10 Nov 2017 04:25:47 +0100 Subject: Aw: Re: LATIN CAPITAL LETTER SHARP S officially recognized In-Reply-To: References: <6063F267-BF9A-42E3-B679-E48BACE47541@alastairs-place.net> <20170703.184946.1082299263384367210.wl@gnu.org> Message-ID: 2017-11-10 3:40 GMT+01:00 Elias M?rtenson via Unicode : > On 9 November 2017 at 18:12, Walter Tross wrote: > >> Long story short: it's Abschlusssatz now (and Rollladen, etc.) One of the >> criteria of the reform was to normalise hyphenation. This has gone so far >> as to hyphenate B?-cker, with the additional criterion of keeping the c >> inside its group. >> > > Wow. That looks incredibly strange to me. Thanks for informing me of this > change, I would probably have thought it to be a typo if I saw that > written. As for B?cker, I presume the previous hyphenation was B?ck-er? (at > least that's how it would be written in Swedish). Is this still allowed? > I.e. are the hyphenation points B?-ck-er? > The strange thing about the "triple s" is that it occurs when hyphenated as "sss" but if hyphenation does not occur, the "triple s" becomes only two (as if "ss" was contextually creating a ligature as a single "s". We have no way to create custom hyphenation sequences such as : "ss-
s" which is what was really intended (with no hyphen the word is compacted using only two "s"). Also I presume that to force the grouping of "ck" and avoid the soft hyphen to break it, a SHY could be used just after it as "B?cker", but I think what was meant was really this: "B?ck-
ker" where the k is repeated AFTER the linebreak while keeping the "ck" group before. This is possible to do that with some markup language, but not in Unicode plain text without requesting the addition of two new controls ! And things could be even worse: here we specify what happens when a linebreak occurs and specify nothing if it does not (the whole inner sequence is deleted). So if the "hyphenated triple s" is compacted to a single sharp s when there's no libebreak, we would need something like this: "?ss-
s
" And for that we would need at least 3 controls in plain text if we don't want markup !!! -------------- next part -------------- An HTML attachment was scrubbed... URL: From unicode at unicode.org Thu Nov 9 21:27:59 2017 From: unicode at unicode.org (Philippe Verdy via Unicode) Date: Fri, 10 Nov 2017 04:27:59 +0100 Subject: Aw: Re: LATIN CAPITAL LETTER SHARP S officially recognized In-Reply-To: References: <6063F267-BF9A-42E3-B679-E48BACE47541@alastairs-place.net> <20170703.184946.1082299263384367210.wl@gnu.org> Message-ID: So this is effectively (custom HTML-like markup) "B?ck-
ker" 2017-11-10 4:11 GMT+01:00 Asmus Freytag via Unicode : > On 11/9/2017 6:40 PM, Elias M?rtenson via Unicode wrote: > > On 9 November 2017 at 18:12, Walter Tross wrote: > >> Long story short: it's Abschlusssatz now (and Rollladen, etc.) One of the >> criteria of the reform was to normalise hyphenation. This has gone so far >> as to hyphenate B?-cker, with the additional criterion of keeping the c >> inside its group. >> > > Wow. That looks incredibly strange to me. Thanks for informing me of this > change, I would probably have thought it to be a typo if I saw that > written. As for B?cker, I presume the previous hyphenation was B?ck-er? > > > no, B?k-ker ... > > (at least that's how it would be written in Swedish). Is this still > allowed? I.e. are the hyphenation points B?-ck-er? > > Regards, > Elias > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From unicode at unicode.org Fri Nov 10 07:44:10 2017 From: unicode at unicode.org (Walter Tross via Unicode) Date: Fri, 10 Nov 2017 14:44:10 +0100 Subject: Aw: Re: LATIN CAPITAL LETTER SHARP S officially recognized In-Reply-To: References: <6063F267-BF9A-42E3-B679-E48BACE47541@alastairs-place.net> <20170703.184946.1082299263384367210.wl@gnu.org> Message-ID: Correct. Just a note: the current hyphenation is B?-cker (as I wrote in a previous email) ( https://www.duden.de/rechtschreibung/Baecker ) 2017-11-10 4:27 GMT+01:00 Philippe Verdy via Unicode : > So this is effectively (custom HTML-like markup) > "B?ck-
ker" > > > 2017-11-10 4:11 GMT+01:00 Asmus Freytag via Unicode : > >> On 11/9/2017 6:40 PM, Elias M?rtenson via Unicode wrote: >> >> On 9 November 2017 at 18:12, Walter Tross wrote: >> >>> Long story short: it's Abschlusssatz now (and Rollladen, etc.) One of >>> the criteria of the reform was to normalise hyphenation. This has gone so >>> far as to hyphenate B?-cker, with the additional criterion of keeping the c >>> inside its group. >>> >> >> Wow. That looks incredibly strange to me. Thanks for informing me of this >> change, I would probably have thought it to be a typo if I saw that >> written. As for B?cker, I presume the previous hyphenation was B?ck-er? >> >> >> no, B?k-ker ... >> >> (at least that's how it would be written in Swedish). Is this still >> allowed? I.e. are the hyphenation points B?-ck-er? >> >> Regards, >> Elias >> >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From unicode at unicode.org Sun Nov 12 16:19:52 2017 From: unicode at unicode.org (Richard Wordingham via Unicode) Date: Sun, 12 Nov 2017 22:19:52 +0000 Subject: ASCII v Unicode In-Reply-To: <62ebc768-7cfa-3bd9-53bc-601a52d76f60@ix.netcom.com> References: <3E1A532E-9E80-46F4-92FB-15B838BC1D84@lboro.ac.uk> <62ebc768-7cfa-3bd9-53bc-601a52d76f60@ix.netcom.com> Message-ID: <20171112221952.21dcdc99@JRWUBU2> On Fri, 3 Nov 2017 02:36:43 -0700 Asmus Freytag via Unicode wrote: > On 11/3/2017 2:13 AM, Andre Schappo via Unicode wrote: > > You may > find?https://twitter.com/andreschappo/status/926163719331176450?amusing > ?? > > Andr? Schappo > > You're wildly off in your page count. > > The "book" part of Unicode (Core Specification) alone is 1,500 pages. > I haven't looked at the single file code charts in a while, but I > believe you get at least that number again. Then add the dozen or so > "Annexes" for a few hundred additional pages and be happy that nobody > prints the Unicode Character Database (or the Unihan Database for > that matter). A reasonable comparison would be ASCII v. ISO 10646 v. Unicode. For example, casing and text boundaries are not normally considered as part of the scope for ASCII. Richard. From unicode at unicode.org Mon Nov 13 12:20:18 2017 From: unicode at unicode.org (Peter Constable via Unicode) Date: Mon, 13 Nov 2017 18:20:18 +0000 Subject: Plane-2-only string Message-ID: I?m wondering if anyone could come up with a string of 15 to 40 characters _using only plane 2 characters_ that wouldn?t be gibberish? We are considering adding sample-text strings in some of our fonts. (In OpenType, the ?name? table can take sample-text strings using name ID 19.) One particular issue we have is the Simsun-ExtB and MingLiU-ExtB fonts, which have CJK characters from plane 2 only. Background: The Simsun-ExtB and MingLiU-ExtB fonts are meant to complement the Simsun and MingLiU fonts: the combined glyph count exceeds the number of glyphs that can be added in a single OpenType font, and so the ?ExtB? fonts are used to contain all of the Plane 2 characters that are supported. For example, the Simsun font supports 28738 BMP characters, and no plane 2 characters, while Simsun-ExtB supports the Basic Latin block from the BMP plus 47,293 plane 2 characters. The combined glyph count exceeds 64K, so can?t go into a single font. Peter -------------- next part -------------- A non-text attachment was scrubbed... Name: winmail.dat Type: application/ms-tnef Size: 15428 bytes Desc: not available URL: From unicode at unicode.org Mon Nov 13 13:38:45 2017 From: unicode at unicode.org (James Kass via Unicode) Date: Mon, 13 Nov 2017 11:38:45 -0800 Subject: Plane-2-only string In-Reply-To: References: Message-ID: A font's sample text can be used in place of the default "The quick brown fox..." text which is used to illustrate the typeface in applications which support that feature. One approach would be to find a non-gibberish text string using some Plane 2 characters and add the BMP glyphs to the font mapped to the BMP PUA. Because if only a handful of BMP CJK glyphs were added to the font mapped to their standard code points, the font might need to claim to support BMP CJK (when in fact it does not) in order to display the sample text. Or, (if standard code points are used) the font might be auto-detected as supporting BMP CJK by some applications, when it doesn't really support that range. On Mon, Nov 13, 2017 at 10:20 AM, Peter Constable via Unicode wrote: > I?m wondering if anyone could come up with a string of 15 to 40 characters _using only plane 2 characters_ that wouldn?t be gibberish? > > We are considering adding sample-text strings in some of our fonts. (In OpenType, the ?name? table can take sample-text strings using name ID 19.) One particular issue we have is the Simsun-ExtB and MingLiU-ExtB fonts, which have CJK characters from plane 2 only. > > Background: > The Simsun-ExtB and MingLiU-ExtB fonts are meant to complement the Simsun and MingLiU fonts: the combined glyph count exceeds the number of glyphs that can be added in a single OpenType font, and so the ?ExtB? fonts are used to contain all of the Plane 2 characters that are supported. For example, the Simsun font supports 28738 BMP characters, and no plane 2 characters, while Simsun-ExtB supports the Basic Latin block from the BMP plus 47,293 plane 2 characters. The combined glyph count exceeds 64K, so can?t go into a single font. > > > > Peter From unicode at unicode.org Mon Nov 13 13:51:18 2017 From: unicode at unicode.org (Philippe Verdy via Unicode) Date: Mon, 13 Nov 2017 20:51:18 +0100 Subject: Plane-2-only string In-Reply-To: References: Message-ID: May be this test page ? http://www.i18nguy.com/unicode/supplementary-test.html 2017-11-13 20:38 GMT+01:00 James Kass via Unicode : > A font's sample text can be used in place of the default "The quick > brown fox..." text which is used to illustrate the typeface in > applications which support that feature. > > One approach would be to find a non-gibberish text string using some > Plane 2 characters and add the BMP glyphs to the font mapped to the > BMP PUA. Because if only a handful of BMP CJK glyphs were added to > the font mapped to their standard code points, the font might need to > claim to support BMP CJK (when in fact it does not) in order to > display the sample text. Or, (if standard code points are used) the > font might be auto-detected as supporting BMP CJK by some > applications, when it doesn't really support that range. > > On Mon, Nov 13, 2017 at 10:20 AM, Peter Constable via Unicode > wrote: > > I?m wondering if anyone could come up with a string of 15 to 40 > characters _using only plane 2 characters_ that wouldn?t be gibberish? > > > > We are considering adding sample-text strings in some of our fonts. (In > OpenType, the ?name? table can take sample-text strings using name ID 19.) > One particular issue we have is the Simsun-ExtB and MingLiU-ExtB fonts, > which have CJK characters from plane 2 only. > > > > Background: > > The Simsun-ExtB and MingLiU-ExtB fonts are meant to complement the > Simsun and MingLiU fonts: the combined glyph count exceeds the number of > glyphs that can be added in a single OpenType font, and so the ?ExtB? fonts > are used to contain all of the Plane 2 characters that are supported. For > example, the Simsun font supports 28738 BMP characters, and no plane 2 > characters, while Simsun-ExtB supports the Basic Latin block from the BMP > plus 47,293 plane 2 characters. The combined glyph count exceeds 64K, so > can?t go into a single font. > > > > > > > > Peter > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From unicode at unicode.org Mon Nov 13 14:05:24 2017 From: unicode at unicode.org (Charlie Ruland via Unicode) Date: Mon, 13 Nov 2017 21:05:24 +0100 Subject: Plane-2-only string In-Reply-To: References: Message-ID: Many of characters in the CJK Compatibility Ideographs Supplement block are quite common Chinese characters, or variants thereof. You could try and build Chinese sentences with these characters. On Mon, 13 Nov 2017 at 20:20 GMT+01:00 Peter Constable via Unicode wrote: > I?m wondering if anyone could come up with a string of 15 to 40 characters _using only plane 2 characters_ that wouldn?t be gibberish? > > We are considering adding sample-text strings in some of our fonts. (In OpenType, the ?name? table can take sample-text strings using name ID 19.) One particular issue we have is the Simsun-ExtB and MingLiU-ExtB fonts, which have CJK characters from plane 2 only. > > Background: > The Simsun-ExtB and MingLiU-ExtB fonts are meant to complement the Simsun and MingLiU fonts: the combined glyph count exceeds the number of glyphs that can be added in a single OpenType font, and so the ?ExtB? fonts are used to contain all of the Plane 2 characters that are supported. For example, the Simsun font supports 28738 BMP characters, and no plane 2 characters, while Simsun-ExtB supports the Basic Latin block from the BMP plus 47,293 plane 2 characters. The combined glyph count exceeds 64K, so can?t go into a single font. > > > > Peter From unicode at unicode.org Mon Nov 13 14:25:24 2017 From: unicode at unicode.org (Peter Constable via Unicode) Date: Mon, 13 Nov 2017 20:25:24 +0000 Subject: Plane-2-only string In-Reply-To: References: Message-ID: We don't want to add BMP characters to the ExtB fonts. Peter -----Original Message----- From: Unicode [mailto:unicode-bounces at unicode.org] On Behalf Of James Kass via Unicode Sent: Monday, November 13, 2017 11:39 AM To: Unicode list Subject: Re: Plane-2-only string A font's sample text can be used in place of the default "The quick brown fox..." text which is used to illustrate the typeface in applications which support that feature. One approach would be to find a non-gibberish text string using some Plane 2 characters and add the BMP glyphs to the font mapped to the BMP PUA. Because if only a handful of BMP CJK glyphs were added to the font mapped to their standard code points, the font might need to claim to support BMP CJK (when in fact it does not) in order to display the sample text. Or, (if standard code points are used) the font might be auto-detected as supporting BMP CJK by some applications, when it doesn't really support that range. On Mon, Nov 13, 2017 at 10:20 AM, Peter Constable via Unicode wrote: > I?m wondering if anyone could come up with a string of 15 to 40 characters _using only plane 2 characters_ that wouldn?t be gibberish? > > We are considering adding sample-text strings in some of our fonts. (In OpenType, the ?name? table can take sample-text strings using name ID 19.) One particular issue we have is the Simsun-ExtB and MingLiU-ExtB fonts, which have CJK characters from plane 2 only. > > Background: > The Simsun-ExtB and MingLiU-ExtB fonts are meant to complement the Simsun and MingLiU fonts: the combined glyph count exceeds the number of glyphs that can be added in a single OpenType font, and so the ?ExtB? fonts are used to contain all of the Plane 2 characters that are supported. For example, the Simsun font supports 28738 BMP characters, and no plane 2 characters, while Simsun-ExtB supports the Basic Latin block from the BMP plus 47,293 plane 2 characters. The combined glyph count exceeds 64K, so can?t go into a single font. > > > > Peter From unicode at unicode.org Mon Nov 13 14:29:01 2017 From: unicode at unicode.org (Peter Constable via Unicode) Date: Mon, 13 Nov 2017 20:29:01 +0000 Subject: Plane-2-only string In-Reply-To: References: Message-ID: Thanks. I?d need to know _at least something_ about what the characters signify, though, to have a sense of whether there?s anything potentially offensive. Peter From: Unicode [mailto:unicode-bounces at unicode.org] On Behalf Of Philippe Verdy via Unicode Sent: Monday, November 13, 2017 11:51 AM To: James Kass Cc: Unicode list Subject: Re: Plane-2-only string May be this test page ? http://www.i18nguy.com/unicode/supplementary-test.html 2017-11-13 20:38 GMT+01:00 James Kass via Unicode >: A font's sample text can be used in place of the default "The quick brown fox..." text which is used to illustrate the typeface in applications which support that feature. One approach would be to find a non-gibberish text string using some Plane 2 characters and add the BMP glyphs to the font mapped to the BMP PUA. Because if only a handful of BMP CJK glyphs were added to the font mapped to their standard code points, the font might need to claim to support BMP CJK (when in fact it does not) in order to display the sample text. Or, (if standard code points are used) the font might be auto-detected as supporting BMP CJK by some applications, when it doesn't really support that range. On Mon, Nov 13, 2017 at 10:20 AM, Peter Constable via Unicode > wrote: > I?m wondering if anyone could come up with a string of 15 to 40 characters _using only plane 2 characters_ that wouldn?t be gibberish? > > We are considering adding sample-text strings in some of our fonts. (In OpenType, the ?name? table can take sample-text strings using name ID 19.) One particular issue we have is the Simsun-ExtB and MingLiU-ExtB fonts, which have CJK characters from plane 2 only. > > Background: > The Simsun-ExtB and MingLiU-ExtB fonts are meant to complement the Simsun and MingLiU fonts: the combined glyph count exceeds the number of glyphs that can be added in a single OpenType font, and so the ?ExtB? fonts are used to contain all of the Plane 2 characters that are supported. For example, the Simsun font supports 28738 BMP characters, and no plane 2 characters, while Simsun-ExtB supports the Basic Latin block from the BMP plus 47,293 plane 2 characters. The combined glyph count exceeds 64K, so can?t go into a single font. > > > > Peter -------------- next part -------------- An HTML attachment was scrubbed... URL: From unicode at unicode.org Mon Nov 13 14:31:30 2017 From: unicode at unicode.org (Peter Constable via Unicode) Date: Mon, 13 Nov 2017 20:31:30 +0000 Subject: Plane-2-only string In-Reply-To: References: Message-ID: Thanks for the suggestion. Alas, the fonts don't support that block. Peter -----Original Message----- From: Unicode [mailto:unicode-bounces at unicode.org] On Behalf Of Charlie Ruland via Unicode Sent: Monday, November 13, 2017 12:05 PM To: unicode at unicode.org Subject: Re: Plane-2-only string Many of characters in the CJK Compatibility Ideographs Supplement block are quite common Chinese characters, or variants thereof. You could try and build Chinese sentences with these characters. On Mon, 13 Nov 2017 at 20:20 GMT+01:00 Peter Constable via Unicode wrote: > I?m wondering if anyone could come up with a string of 15 to 40 characters _using only plane 2 characters_ that wouldn?t be gibberish? > > We are considering adding sample-text strings in some of our fonts. (In OpenType, the ?name? table can take sample-text strings using name ID 19.) One particular issue we have is the Simsun-ExtB and MingLiU-ExtB fonts, which have CJK characters from plane 2 only. > > Background: > The Simsun-ExtB and MingLiU-ExtB fonts are meant to complement the Simsun and MingLiU fonts: the combined glyph count exceeds the number of glyphs that can be added in a single OpenType font, and so the ?ExtB? fonts are used to contain all of the Plane 2 characters that are supported. For example, the Simsun font supports 28738 BMP characters, and no plane 2 characters, while Simsun-ExtB supports the Basic Latin block from the BMP plus 47,293 plane 2 characters. The combined glyph count exceeds 64K, so can?t go into a single font. > > > > Peter From unicode at unicode.org Mon Nov 13 14:45:09 2017 From: unicode at unicode.org (James Kass via Unicode) Date: Mon, 13 Nov 2017 12:45:09 -0800 Subject: Plane-2-only string In-Reply-To: References: Message-ID: Peter Constable wrote, On Mon, Nov 13, 2017 at 12:25 PM, Peter Constable wrote: > We don't want to add BMP characters to the ExtB fonts. > > > Peter > > -----Original Message----- > From: Unicode [mailto:unicode-bounces at unicode.org] On Behalf Of James Kass via Unicode > Sent: Monday, November 13, 2017 11:39 AM > To: Unicode list > Subject: Re: Plane-2-only string > > A font's sample text can be used in place of the default "The quick brown fox..." text which is used to illustrate the typeface in applications which support that feature. > > One approach would be to find a non-gibberish text string using some Plane 2 characters and add the BMP glyphs to the font mapped to the BMP PUA. Because if only a handful of BMP CJK glyphs were added to the font mapped to their standard code points, the font might need to claim to support BMP CJK (when in fact it does not) in order to display the sample text. Or, (if standard code points are used) the font might be auto-detected as supporting BMP CJK by some applications, when it doesn't really support that range. > > On Mon, Nov 13, 2017 at 10:20 AM, Peter Constable via Unicode wrote: >> I?m wondering if anyone could come up with a string of 15 to 40 characters _using only plane 2 characters_ that wouldn?t be gibberish? >> >> We are considering adding sample-text strings in some of our fonts. (In OpenType, the ?name? table can take sample-text strings using name ID 19.) One particular issue we have is the Simsun-ExtB and MingLiU-ExtB fonts, which have CJK characters from plane 2 only. >> >> Background: >> The Simsun-ExtB and MingLiU-ExtB fonts are meant to complement the Simsun and MingLiU fonts: the combined glyph count exceeds the number of glyphs that can be added in a single OpenType font, and so the ?ExtB? fonts are used to contain all of the Plane 2 characters that are supported. For example, the Simsun font supports 28738 BMP characters, and no plane 2 characters, while Simsun-ExtB supports the Basic Latin block from the BMP plus 47,293 plane 2 characters. The combined glyph count exceeds 64K, so can?t go into a single font. >> >> >> >> Peter > From unicode at unicode.org Mon Nov 13 14:46:18 2017 From: unicode at unicode.org (James Kass via Unicode) Date: Mon, 13 Nov 2017 12:46:18 -0800 Subject: Plane-2-only string In-Reply-To: References: Message-ID: Peter Constable wrote, > We don't want to add BMP characters to the ExtB fonts. How about Plane 15 or 16, then? From unicode at unicode.org Mon Nov 13 14:46:18 2017 From: unicode at unicode.org (John H. Jenkins via Unicode) Date: Mon, 13 Nov 2017 13:46:18 -0700 Subject: Plane-2-only string In-Reply-To: References: Message-ID: <55072E26-7741-4F95-9B98-BB5809C80D9A@apple.com> ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? That is an example of forty Cantonese-specific characters which are not obscene (that I'm aware of) from Extension B. For the curious, I've appended at the bottom the full list of 280 for all of Plane 2 which I was able to pull out of the Unihan database. I'm sure some enterprising poet can make something out of them. > On Nov 13, 2017, at 11:20 AM, Peter Constable via Unicode wrote: > > I?m wondering if anyone could come up with a string of 15 to 40 characters _using only plane 2 characters_ that wouldn?t be gibberish? > > We are considering adding sample-text strings in some of our fonts. (In OpenType, the ?name? table can take sample-text strings using name ID 19.) One particular issue we have is the Simsun-ExtB and MingLiU-ExtB fonts, which have CJK characters from plane 2 only. > > Background: > The Simsun-ExtB and MingLiU-ExtB fonts are meant to complement the Simsun and MingLiU fonts: the combined glyph count exceeds the number of glyphs that can be added in a single OpenType font, and so the ?ExtB? fonts are used to contain all of the Plane 2 characters that are supported. For example, the Simsun font supports 28738 BMP characters, and no plane 2 characters, while Simsun-ExtB supports the Basic Latin block from the BMP plus 47,293 plane 2 characters. The combined glyph count exceeds 64K, so can?t go into a single font. > > > > Peter > U+201A9 faan2 (Cant.) to play U+20325 wu1 wu3 (Cant.) to bow, stoop U+20341 man3 (Cant.) an undesirable situation U+204FC sip3 (Cant.) a wedge; to thrust in U+20544 nap1 (Cant.) ???, a dimple U+2076D peng2 (Cant.) to fell, cut; to sweep away U+20779 gaai3 (Cant.) to cut with a knife or scissors U+20BA8 naai3 (Cant.) to tie, tow; bring along U+20BA9 aa1 liu1 (Cant.) an interjection; rare, specialized U+20BCB jai4 jai5 (Cant.) naughty, inferior U+20BE6 cai3 (Cant.) to eat, take a meal U+20BFD zi1 (Cant.) a final particle indicating affirmation U+20C0B jaau1 (Cant.) left-handed U+20C32 eot1 (Cant.) to belch U+20C41 tam3 (Cant.) to fool, trick, cheat U+20C42 dat1 (Cant.) to put something or sit wherever one wishes; to rebuke, reproach U+20C43 nip1 (Cant.) thin, flat; poor U+20C53 ngai1 (Cant.) to importune, beg U+20C58 ngaak6 (Cant.) contrary, opposing, against; disobedient U+20C65 fik1 jit6 we5 (Cant.) wrangling, a noise; fitful; a soft fabric with no body U+20C77 ming1 (Cant.) small U+20C78 san2 seon2 (Cant.) phonetic U+20C9C zaang1 (Cant.) to owe U+20CCF ce2 ce6 (Cant.) interjection U+20CD5 caau3 (Cant.) to search U+20CD6 dap6 (Cant.) to strike, pound U+20D15 miu2 (Cant.) to purse the lips; to wriggle U+20D30 gau6 (Cant.) classifier for a piece or lump of something U+20D47 keu4 (Cant.) peculiar, strange U+20D48 mui2 (Cant.) to suck or chew without using the teeth U+20D49 hong4 (Cant.) hope U+20D69 go2 (Cant.) that U+20D6F gwit1 gwit3 (Cant.) onomatopoetic U+20D7C mang1 mang4 (Cant.) scars on the eyelid; phonetic U+20D7E waak1 (Cant.) eloquent, sharp-tongued U+20D7F pe1 pe5 (Cant.) a pair (from the Engl.); to stagger U+20D9C zai3 (Cant.) to do, work; to be willing U+20DA7 dim6 (Cant.) straight, vertical; OK; to pick up with the fingers; verbal aspect marker of successful completion U+20DB2 gap6 kap6 (Cant.) to stare at; to take a big bite U+20E09 kak1 (Cant.) to block, obstruct U+20E0A tap1 (Cant.) an intensifying particle U+20E0E naa1 (Cant.) and, with U+20E0F ge2 (Cant.) final particle U+20E10 kam1 (Cant.) to endure, last U+20E11 soek3 (Cant.) soft, sodden U+20E12 bou2 (Cant.) ????, a stranger U+20E3A ngaak6 (Cant.) contrary, opposing U+20E6D ko1 (Cant.) to call (Engl. loan-word) U+20E73 git6 (Cant.) thick, viscous, dense U+20E77 ngo4 (Cant.) to speak tirelessly U+20E78 kam2 (Cant.) to cover, close up U+20E7A maai4 (Cant.) verbal aspect marker for comletion or movement towards U+20E7B zam6 (Cant.) classifier for smells U+20E8C gwe1 (Cant.) timid U+20E98 long1 long2 (Cant.) hard to get along with; to rinse, spread thin U+20E9D gaak3 (Cant.) final particle U+20EA2 gaa1 gaa2 (Cant.) final particle U+20EAA he3 hi1 (Cant.) in a rush; slovenly U+20EAB leu1 (Cant.) strange, peculiar U+20EAC he2 (Cant.) final particle U+20ED7 le4 (Cant.) imperative final particle U+20ED8 zeot6 (Cant.) sound of eating (onomatopoetic) U+20EF4 long2 (Cant.) to rinse U+20EFA aa6 (Cant.) final particle U+20EFB bai3 (Cant.) noise, clamor U+20F15 paai2 (Cant.) a suffix indicating time U+20F2D but1 (Cant.) sound of a car-horn (onomatopoetic) U+20F2E ngai1 ngi1 (Cant.) to urge, importune; a lie, fib U+20F31 loe1 loe2 (Cant.) to spit out; to pester, nag U+20F4C syut3 (Cant.) sound of something rushing by U+20F52 neng2 (Cant.) classifier for hats U+20F64 kik1 (Cant.) to block, obstruct; head; phonetic U+20F8D he3 (Cant.) to flick something off in a disorderly way U+20F8F ce1 (Cant.) interjection U+20FAD we5 (Cant.) soft fabric with no body U+20FB4 baang4 baang6 (Cant.) phonetic U+20FB5 zaa1 (Cant.) final particle U+20FBC cyut1 cyut6 (Cant.) phonetic U+20FEA gaa2 (Cant.) final particle U+20FEB saau4 (Cant.) shabby U+20FEC soe4 (Cant.) ignorant U+20FED wet1 (Cant.) to go somewhere to have a good time U+2101D nam6 (Cant.) sound asleep U+2101E zip1 (Cant.) a Jeep; to wave, beckon U+21020 bei6 or; emphatic particle; (Cant.) particle implying doubt U+21029 lok1 (Cant.) onomatopoetic U+2104F am1 ngam1 (Cant.) soft rice or food for a baby U+2105C wo5 (Cant.) particle to close a quote U+2106F dyut1 (Cant.) to pout U+21075 gan2 (Cant.) aspect marker for continuous action U+21076 zit1 (Cant.) to scratch an itch U+21077 doeng1 (Cant.) a sharp point; to peck U+21078 kwaat1 (Cant.) a circle, ring U+2107B ziu1 (Cant.) to beat someone up U+21088 buk6 (Cant.) to lie prone; to bend over U+21096 lai2 (Cant.) unrestrained U+2109D zuk6 (Cant.) to choke and cough U+210C0 e4 nge4 (Cant.) a musical instrument U+210C1 leng1 (Cant.) member of a triad; young U+210C7 bai6 (Cant.) exclamation U+210C8 kwaak1 kwaak3 (Cant.) a lasso; a circle, frame U+210C9 gaa3 (Cant.) final particle U+210CF doe6 (Cant.) to droop, hang down U+210D3 bo3 (Cant.) final particle for emphasis U+210E4 laai6 (Cant.) to leave behind, omit U+210F4 ceoi4 (Cant.) smell, odor U+210F5 ngung1 ngung2 (Cant.) to cover, bury; push from behind U+210F6 sek3 (Cant.) to like, love; to kiss U+2111F haa1 (Cant.) onomatopoetic, the sound of panting U+2112F jik1 (Cant.) hiccough U+21135 ji1 (Cant.) to grin, laugh U+2113D soe4 (Cant.) to slide down U+21148 laa3 (Cant.) a particle implying completion, certainty, or urgency U+2114F lai2 (Cant.) to accuse, slander; to turn, sprain U+21180 gwang2 (Cant.) special relationship U+21187 wok1 (Cant.) a watt (Engl. loan-word) U+211D9 doe4 (Cant.) round and full U+21681 bai6 used-up, malpractices; (Cant.) bad, vile, corrupt U+21731 gei2 to envy, to be angry with; (Cant.) pregnant U+2197C me1 (Cant.) to carry on the back U+21C2A duk1 (Cant.) end, bottom, rump U+21CAC gwat6 (Cant.) blunt U+220C7 lei5 (Cant.) a sail U+22208 nap1 (Cant.) dimple U+22605 maau4 (Cant.) flurried, flustered; arbitrariliy U+22696 ti4 (Cant.) intensifier U+226F4 mang2 (Cant.) annoyed, impatient, restless U+226F5 zang2 (Cant.) annoyed, irritated U+22775 fit1 (Cant.) ???, to be fashionable U+227B5 fit1 (Cant.) to brush, whisk U+22803 geng6 (Cant.) to guard against; to take precautions U+22939 goe4 (Cant.) satisfied, comfortable U+22982 laan2 (Cant.) to brag, praise oneself U+22A66 zit1 (Cant.) to squeeze out (as from a tube); to tickle U+22ACF kam2 (Cant.) to cover U+22AD5 wing1 wing6 (Cant.) to throw away U+22AE8 ngung2 (Cant.) to push from behind U+22AEB lat1 (Cant.) to rub U+22B3F kaai2 kaai5 (Cant.) sections or wedges (as of fruit); to take in the hand; to use U+22B43 dau3 dau6 (Cant.) to touch; to bump into; to take, get, receive; to lightly support something with the hand U+22B91 luk1 (Cant.) classifier for lengths of cylindrically shaped objects U+22BCA dik1 (Cant.) determination, resolution U+22BCE ngaau1 (Cant.) to scratch U+22C38 wo5 (Cant.) rotten, bad, spoiled U+22C51 waa2 (Cant.) to scratch U+22C55 dap6 (Cant.) to beat, poud; to get drenched U+22C62 saak3 to select; (Cant.) a wedge of a fruit such as an orange U+22CA1 laa2 naa1 (Cant.) to grab with the hands; and, with U+22CA9 kap1 (Cant.) to affix a chop or seal to a document U+22CB5 cou5 (Cant.) to save up (money), to save up bit-by-bit U+22CB7 ngaau1 to search; (Cant.) to scratch U+22CB8 lou1 (Cant.) to shake violently, stir; to strip U+22CC2 bat1 pat1 (Cant.) to scoop up, ladle out U+22CC6 ngou4 ngou6 (Cant.) to shake, rattle U+22D08 daat3 (Cant.) to throw down, fall down U+22D12 paang1 (Cant.) to chase, drive away U+22D44 cou5 (Cant.) to save up (money) U+22D4C deoi2 (Cant.) to goad, incite U+22D53 paang1 (Cant.) to rush; chase someone out, drive out U+22D67 gaan3 (Cant.) to draw lines U+22D8D saap3 (Cant.) garbage U+22D9C ngung2 (Cant.) to push; pull open U+22DA0 saau4 (Cant.) to take without asking U+22DA4 loe2 (Cant.) to pester, nag; to wallow; to roll around on the floor U+22DAF maan1 (Cant.) to pull, turn U+22DEE deoi2 (Cant.) to poke, nudge; stretch out U+22E51 zaang6 (Cant.) to widen with force U+22E8B naan3 (Cant.) to stitch together, quilt U+22F74 duk1 (Cant.) to poke, jab U+233F4 jan4 (Cant.) a kind of fruit U+233FE dak6 (J) non-standard variant of ? U+6750, material, stuff; timber; talent; (Cant.) a peg, row of pegs U+23528 kang3 (Cant.) to be entangled, twisted; (of alcohol and tobacco) to be strong U+23595 peng1 (Cant.) the back of a chair for one to lean against U+2361A seot1 (Cant.) a bar; to bolt, lock U+23695 jaap3 (Cant.) to wave, beckon with the hand U+236BA hong2 hong6 (Cant.) a young chicken U+239C2 laai5 (Cant.) untidy U+23CB7 nap6 (Cant.) sticky; not smooth; slow U+23CFC doe4 (Cant.) salivating U+241A3 saap6 (Cant.) to cook in boiling water U+24292 luk6 (Cant.) to scald with boiling water U+2430D hok3 (Cant.) to fry U+245C8 sip3 (Cant.) to squeeze in, to stuff in U+24674 caau1 (Cant.) gore U+2472F kap6 (Cant.) to bite U+24DB8 naa1 (Cant.) a scar U+24DC7 wak6 (Cant.) severe pain U+24DEA mang2 (Cant.) impatient, restless U+24DEB cek3 cik1 (Cant.) a prickling pain, ache U+24E3B naa1 (Cant.) a scar, scab; and, with U+24E50 lit3 (Cant.) a knot U+24EA7 zang2 (Cant.) annoyed U+24FC2 saai4 (Cant.) unattractive, pale U+24FEA zaap3 (Cant.) wrinkled, crumpled U+2502C jim2 (Cant.) a scar U+25052 ngaau4 (Cant.) warped U+2510E cik1 (Cant.) to pull, lift up U+2512B gap6 (Cant.) to stare, peep at U+25148 laap3 (Cant.) to look, scan U+25160 hau1 (Cant.) to fix one's eyes on, gaze at U+2517E zong1 (Cant.) to peek or peep at U+251E3 gwat6 (Cant.) to glance U+25232 kip1 (Cant.) to keep a close eye on, to control U+25236 nam6 (Cant.) sound asleep U+2528C caau4 (Cant.) wrinkled, folded, creased, crumpled U+25299 zong1 (Cant.) to peep at, look at secretly U+252C7 caang3 (Cant.) to open the eyes wide U+252D8 saau4 (Cant.) to swep the eyes over something U+2531B lai6 (Cant.) to gaze greedily at U+25531 sin3 (Cant.) to slip U+2553F ham2 (Cant.) classifier for cannons, large guns, etc. U+25945 lung1 (Cant.) a hole, hollow; cavity U+259F9 tam5 (Cant.) puddle U+25E49 nap6 (Cant.) sticky U+26097 sok3 (Cant.) to tighten U+260A5 dam3 (Cant.) to drop down U+26258 caang1 (Cant.) a cooking pot, cooker U+265BF dap1 (Cant.) to hang down; to lower one's head U+26629 paa4 (Cant.) chin U+26696 zaap3 (Cant.) to wink U+2688A pok1 (Cant.) blister U+26893 mak6 (Cant.) mole on skin U+26926 hot3 (Cant.) a smell, scent U+269F2 loe1 loe2 (Cant.) to dribble, spit; to pester, nag U+269FA laai2 laai5 (Cant.) to lick, lap up U+26A88 ngou3 (Cant.) to kneel U+26ED0 zaau3 (Cant.) to fry in oil U+27285 gwaai2 (Cant.) frog, toad U+272B6 doe3 (Cant.) insect sting U+272CA saa1 (Cant.) a large butterfly U+272E6 mei1 (Cant.) a dragonfly; a small boat without a sail U+27307 bang1 (Cant.) a large butterfly U+27574 naan3 (Cant.) a pimple, an insect bite U+27639 taai1 (Cant.) a necktie U+27685 long6 (Cant.) crotch U+27694 tung2 (Cant.) a kind of skirt U+2775E gei1 (Cant.) ???, khaki U+2789D lai6 (Cant.) to stare angrily U+278C8 caau1 (Cant.) to gore U+2797A kwan1 (Cant.) to fool, deceive, hoodwink U+279A0 ngaak1 (Cant.) to deceive U+279DD ngaa6 (Cant.) ????, to bar the way, obstruct U+27A0A zaa6 (Cant.) ????, to bar the way, obstruct U+27A3E tam3 (Cant.) to fool, trick, cheat U+27D2F me1 (Cant.) to carry on the back U+27D84 zaang1 (Cant.) to owe U+27ED9 mut6 (Cant.) ?????, not straightforward U+27FD2 dam6 (Cant.) to stamp (one's foot) U+27FEB tau2 (Cant.) to have a rest U+28023 kei2 (Cant.) a home, house U+28024 leoi1 (Cant.) to suddenly fall or drop down U+28048 gaang3 (Cant.) to ford, wade U+28090 leoi1 (Cant.) to suddenly fall or drop down U+280BD dam6 (Cant.) to stamp the foot U+280BE naam3 (Cant.) to step across U+280E9 sin3 (Cant.) to slip, slide U+2814F laam3 (Cant.) to step over, step across U+2815D jaang3 (Cant.) to press down or out with the foot; to kick; to tread on U+281AA jaang3 (Cant.) to press down or push out with the foot U+281AF buk6 (Cant.) to lie prone, bend over U+28207 laam3 (Cant.) to step over, step across U+28256 nei1 ni1 (Cant.) to hide oneself U+2827C wu3 (Cant.) to stoop, bow U+2829B laak3 (Cant.) nude, naked U+282CD wan1 wen1 (Cant.) a van U+282E2 lip1 (Cant.) an elevator (from the British 'lift') U+28B4C baang1 paang1 (Cant.) bang; pan (Eng. loanwords) U+294E5 ngok6 (Cant.) to raise the head U+295F4 bung6 (Cant.) classifier for odors U+29720 mam1 ngam1 (Cant.) soft rice for a small child U+2994B au6 ngau6 to gallop wildly; (Cant.) stupid U+29A4D peng1 (Cant.) ribs, rib-cage U+29B0E jam1 jam4 (Cant.) bangs (hair) U+2A400 naa1 (Cant.) relationship; together U+2A4AC nung1 (Cant.) burned U+2A601 kap6 (Cant.) to bite U+2A632 ji1 (Cant.) to grin, smile U+2A65B nak1 (Cant.) decayed teeth; tongue-tied U+2A6A9 gwi1 (Cant.) sound of shouting U+2F907 baan6 (Cant.) mud, mire From unicode at unicode.org Mon Nov 13 14:48:39 2017 From: unicode at unicode.org (James Kass via Unicode) Date: Mon, 13 Nov 2017 12:48:39 -0800 Subject: Plane-2-only string In-Reply-To: References: Message-ID: Peter Constable wrote, >> May be this test page ? >> >> http://www.i18nguy.com/unicode/supplementary-test.html > > Thanks. I?d need to know _at least something_ about what the characters > signify, though, to have a sense of whether there?s anything potentially > offensive. The Plane 2 characters on that page appear to be random. From unicode at unicode.org Mon Nov 13 14:57:50 2017 From: unicode at unicode.org (Philippe Verdy via Unicode) Date: Mon, 13 Nov 2017 21:57:50 +0100 Subject: Plane-2-only string In-Reply-To: References: Message-ID: 2017-11-13 21:48 GMT+01:00 James Kass : > Peter Constable wrote, > > >> May be this test page ? > >> > >> http://www.i18nguy.com/unicode/supplementary-test.html > > > > Thanks. I?d need to know _at least something_ about what the characters > > signify, though, to have a sense of whether there?s anything potentially > > offensive. > > The Plane 2 characters on that page appear to be random. > That's probable but the authors claim these are common characters. It's possible they collected statistics from some corpus to find some of the most widely used characters in Plane 2, without needing to understand what they would mean if they are put side by side (I had noted already that there was no punctuation at all, and the exposed collection is too long for a typical Chinese text, and in fact I would expect the presence of some CJK punctuations. May be we could compile a list of Chinese toponyms using these, and select those that use more than one Plane2 character, then separate these names using CJK commas and a final CJK full stop. Some Wikidata or OSM data search could be used to compile such list (I think these topynyms will more likely be found in Cantonese, or Taiwanese related sources, using the zh-Hant variant, but note that Wikidata does not distinguish zh-Hans and zh-Hant as Wikimedia wikis use a transliterator, but I doubt this transliterator performs transforms with Plane2 characters which should remain unchanged with most of them kept for both traditional and simplified use). -------------- next part -------------- An HTML attachment was scrubbed... URL: From unicode at unicode.org Mon Nov 13 15:19:04 2017 From: unicode at unicode.org (Peter Constable via Unicode) Date: Mon, 13 Nov 2017 21:19:04 +0000 Subject: Plane-2-only string In-Reply-To: <55072E26-7741-4F95-9B98-BB5809C80D9A@apple.com> References: <55072E26-7741-4F95-9B98-BB5809C80D9A@apple.com> Message-ID: Would a typical Chinese speaker be likely to recognize these as used in Cantonese? (I wouldn't want to have a font's sample-text string give the impression that it's a Cantonese font ? unless it were specifically intended for Cantonese.) -----Original Message----- From: jenkins at apple.com [mailto:jenkins at apple.com] Sent: Monday, November 13, 2017 12:46 PM To: Peter Constable Cc: Unicode list Subject: Re: Plane-2-only string ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? That is an example of forty Cantonese-specific characters which are not obscene (that I'm aware of) from Extension B. For the curious, I've appended at the bottom the full list of 280 for all of Plane 2 which I was able to pull out of the Unihan database. I'm sure some enterprising poet can make something out of them. > On Nov 13, 2017, at 11:20 AM, Peter Constable via Unicode wrote: > > I?m wondering if anyone could come up with a string of 15 to 40 characters _using only plane 2 characters_ that wouldn?t be gibberish? > > We are considering adding sample-text strings in some of our fonts. (In OpenType, the ?name? table can take sample-text strings using name ID 19.) One particular issue we have is the Simsun-ExtB and MingLiU-ExtB fonts, which have CJK characters from plane 2 only. > > Background: > The Simsun-ExtB and MingLiU-ExtB fonts are meant to complement the Simsun and MingLiU fonts: the combined glyph count exceeds the number of glyphs that can be added in a single OpenType font, and so the ?ExtB? fonts are used to contain all of the Plane 2 characters that are supported. For example, the Simsun font supports 28738 BMP characters, and no plane 2 characters, while Simsun-ExtB supports the Basic Latin block from the BMP plus 47,293 plane 2 characters. The combined glyph count exceeds 64K, so can?t go into a single font. > > > > Peter > U+201A9 faan2 (Cant.) to play U+20325 wu1 wu3 (Cant.) to bow, stoop U+20341 man3 (Cant.) an undesirable situation U+204FC sip3 (Cant.) a wedge; to thrust in U+20544 nap1 (Cant.) ???, a dimple U+2076D peng2 (Cant.) to fell, cut; to sweep away U+20779 gaai3 (Cant.) to cut with a knife or scissors U+20BA8 naai3 (Cant.) to tie, tow; bring along U+20BA9 aa1 liu1 (Cant.) an interjection; rare, specialized U+20BCB jai4 jai5 (Cant.) naughty, inferior U+20BE6 cai3 (Cant.) to eat, take a meal U+20BFD zi1 (Cant.) a final particle indicating affirmation U+20C0B jaau1 (Cant.) left-handed U+20C32 eot1 (Cant.) to belch U+20C41 tam3 (Cant.) to fool, trick, cheat U+20C42 dat1 (Cant.) to put something or sit wherever one wishes; to rebuke, reproach U+20C43 nip1 (Cant.) thin, flat; poor U+20C53 ngai1 (Cant.) to importune, beg U+20C58 ngaak6 (Cant.) contrary, opposing, against; disobedient U+20C65 fik1 jit6 we5 (Cant.) wrangling, a noise; fitful; a soft fabric with no body U+20C77 ming1 (Cant.) small U+20C78 san2 seon2 (Cant.) phonetic U+20C9C zaang1 (Cant.) to owe U+20CCF ce2 ce6 (Cant.) interjection U+20CD5 caau3 (Cant.) to search U+20CD6 dap6 (Cant.) to strike, pound U+20D15 miu2 (Cant.) to purse the lips; to wriggle U+20D30 gau6 (Cant.) classifier for a piece or lump of something U+20D47 keu4 (Cant.) peculiar, strange U+20D48 mui2 (Cant.) to suck or chew without using the teeth U+20D49 hong4 (Cant.) hope U+20D69 go2 (Cant.) that U+20D6F gwit1 gwit3 (Cant.) onomatopoetic U+20D7C mang1 mang4 (Cant.) scars on the eyelid; phonetic U+20D7E waak1 (Cant.) eloquent, sharp-tongued U+20D7F pe1 pe5 (Cant.) a pair (from the Engl.); to stagger U+20D9C zai3 (Cant.) to do, work; to be willing U+20DA7 dim6 (Cant.) straight, vertical; OK; to pick up with the fingers; verbal aspect marker of successful completion U+20DB2 gap6 kap6 (Cant.) to stare at; to take a big bite U+20E09 kak1 (Cant.) to block, obstruct U+20E0A tap1 (Cant.) an intensifying particle U+20E0E naa1 (Cant.) and, with U+20E0F ge2 (Cant.) final particle U+20E10 kam1 (Cant.) to endure, last U+20E11 soek3 (Cant.) soft, sodden U+20E12 bou2 (Cant.) ????, a stranger U+20E3A ngaak6 (Cant.) contrary, opposing U+20E6D ko1 (Cant.) to call (Engl. loan-word) U+20E73 git6 (Cant.) thick, viscous, dense U+20E77 ngo4 (Cant.) to speak tirelessly U+20E78 kam2 (Cant.) to cover, close up U+20E7A maai4 (Cant.) verbal aspect marker for comletion or movement towards U+20E7B zam6 (Cant.) classifier for smells U+20E8C gwe1 (Cant.) timid U+20E98 long1 long2 (Cant.) hard to get along with; to rinse, spread thin U+20E9D gaak3 (Cant.) final particle U+20EA2 gaa1 gaa2 (Cant.) final particle U+20EAA he3 hi1 (Cant.) in a rush; slovenly U+20EAB leu1 (Cant.) strange, peculiar U+20EAC he2 (Cant.) final particle U+20ED7 le4 (Cant.) imperative final particle U+20ED8 zeot6 (Cant.) sound of eating (onomatopoetic) U+20EF4 long2 (Cant.) to rinse U+20EFA aa6 (Cant.) final particle U+20EFB bai3 (Cant.) noise, clamor U+20F15 paai2 (Cant.) a suffix indicating time U+20F2D but1 (Cant.) sound of a car-horn (onomatopoetic) U+20F2E ngai1 ngi1 (Cant.) to urge, importune; a lie, fib U+20F31 loe1 loe2 (Cant.) to spit out; to pester, nag U+20F4C syut3 (Cant.) sound of something rushing by U+20F52 neng2 (Cant.) classifier for hats U+20F64 kik1 (Cant.) to block, obstruct; head; phonetic U+20F8D he3 (Cant.) to flick something off in a disorderly way U+20F8F ce1 (Cant.) interjection U+20FAD we5 (Cant.) soft fabric with no body U+20FB4 baang4 baang6 (Cant.) phonetic U+20FB5 zaa1 (Cant.) final particle U+20FBC cyut1 cyut6 (Cant.) phonetic U+20FEA gaa2 (Cant.) final particle U+20FEB saau4 (Cant.) shabby U+20FEC soe4 (Cant.) ignorant U+20FED wet1 (Cant.) to go somewhere to have a good time U+2101D nam6 (Cant.) sound asleep U+2101E zip1 (Cant.) a Jeep; to wave, beckon U+21020 bei6 or; emphatic particle; (Cant.) particle implying doubt U+21029 lok1 (Cant.) onomatopoetic U+2104F am1 ngam1 (Cant.) soft rice or food for a baby U+2105C wo5 (Cant.) particle to close a quote U+2106F dyut1 (Cant.) to pout U+21075 gan2 (Cant.) aspect marker for continuous action U+21076 zit1 (Cant.) to scratch an itch U+21077 doeng1 (Cant.) a sharp point; to peck U+21078 kwaat1 (Cant.) a circle, ring U+2107B ziu1 (Cant.) to beat someone up U+21088 buk6 (Cant.) to lie prone; to bend over U+21096 lai2 (Cant.) unrestrained U+2109D zuk6 (Cant.) to choke and cough U+210C0 e4 nge4 (Cant.) a musical instrument U+210C1 leng1 (Cant.) member of a triad; young U+210C7 bai6 (Cant.) exclamation U+210C8 kwaak1 kwaak3 (Cant.) a lasso; a circle, frame U+210C9 gaa3 (Cant.) final particle U+210CF doe6 (Cant.) to droop, hang down U+210D3 bo3 (Cant.) final particle for emphasis U+210E4 laai6 (Cant.) to leave behind, omit U+210F4 ceoi4 (Cant.) smell, odor U+210F5 ngung1 ngung2 (Cant.) to cover, bury; push from behind U+210F6 sek3 (Cant.) to like, love; to kiss U+2111F haa1 (Cant.) onomatopoetic, the sound of panting U+2112F jik1 (Cant.) hiccough U+21135 ji1 (Cant.) to grin, laugh U+2113D soe4 (Cant.) to slide down U+21148 laa3 (Cant.) a particle implying completion, certainty, or urgency U+2114F lai2 (Cant.) to accuse, slander; to turn, sprain U+21180 gwang2 (Cant.) special relationship U+21187 wok1 (Cant.) a watt (Engl. loan-word) U+211D9 doe4 (Cant.) round and full U+21681 bai6 used-up, malpractices; (Cant.) bad, vile, corrupt U+21731 gei2 to envy, to be angry with; (Cant.) pregnant U+2197C me1 (Cant.) to carry on the back U+21C2A duk1 (Cant.) end, bottom, rump U+21CAC gwat6 (Cant.) blunt U+220C7 lei5 (Cant.) a sail U+22208 nap1 (Cant.) dimple U+22605 maau4 (Cant.) flurried, flustered; arbitrariliy U+22696 ti4 (Cant.) intensifier U+226F4 mang2 (Cant.) annoyed, impatient, restless U+226F5 zang2 (Cant.) annoyed, irritated U+22775 fit1 (Cant.) ???, to be fashionable U+227B5 fit1 (Cant.) to brush, whisk U+22803 geng6 (Cant.) to guard against; to take precautions U+22939 goe4 (Cant.) satisfied, comfortable U+22982 laan2 (Cant.) to brag, praise oneself U+22A66 zit1 (Cant.) to squeeze out (as from a tube); to tickle U+22ACF kam2 (Cant.) to cover U+22AD5 wing1 wing6 (Cant.) to throw away U+22AE8 ngung2 (Cant.) to push from behind U+22AEB lat1 (Cant.) to rub U+22B3F kaai2 kaai5 (Cant.) sections or wedges (as of fruit); to take in the hand; to use U+22B43 dau3 dau6 (Cant.) to touch; to bump into; to take, get, receive; to lightly support something with the hand U+22B91 luk1 (Cant.) classifier for lengths of cylindrically shaped objects U+22BCA dik1 (Cant.) determination, resolution U+22BCE ngaau1 (Cant.) to scratch U+22C38 wo5 (Cant.) rotten, bad, spoiled U+22C51 waa2 (Cant.) to scratch U+22C55 dap6 (Cant.) to beat, poud; to get drenched U+22C62 saak3 to select; (Cant.) a wedge of a fruit such as an orange U+22CA1 laa2 naa1 (Cant.) to grab with the hands; and, with U+22CA9 kap1 (Cant.) to affix a chop or seal to a document U+22CB5 cou5 (Cant.) to save up (money), to save up bit-by-bit U+22CB7 ngaau1 to search; (Cant.) to scratch U+22CB8 lou1 (Cant.) to shake violently, stir; to strip U+22CC2 bat1 pat1 (Cant.) to scoop up, ladle out U+22CC6 ngou4 ngou6 (Cant.) to shake, rattle U+22D08 daat3 (Cant.) to throw down, fall down U+22D12 paang1 (Cant.) to chase, drive away U+22D44 cou5 (Cant.) to save up (money) U+22D4C deoi2 (Cant.) to goad, incite U+22D53 paang1 (Cant.) to rush; chase someone out, drive out U+22D67 gaan3 (Cant.) to draw lines U+22D8D saap3 (Cant.) garbage U+22D9C ngung2 (Cant.) to push; pull open U+22DA0 saau4 (Cant.) to take without asking U+22DA4 loe2 (Cant.) to pester, nag; to wallow; to roll around on the floor U+22DAF maan1 (Cant.) to pull, turn U+22DEE deoi2 (Cant.) to poke, nudge; stretch out U+22E51 zaang6 (Cant.) to widen with force U+22E8B naan3 (Cant.) to stitch together, quilt U+22F74 duk1 (Cant.) to poke, jab U+233F4 jan4 (Cant.) a kind of fruit U+233FE dak6 (J) non-standard variant of ? U+6750, material, stuff; timber; talent; (Cant.) a peg, row of pegs U+23528 kang3 (Cant.) to be entangled, twisted; (of alcohol and tobacco) to be strong U+23595 peng1 (Cant.) the back of a chair for one to lean against U+2361A seot1 (Cant.) a bar; to bolt, lock U+23695 jaap3 (Cant.) to wave, beckon with the hand U+236BA hong2 hong6 (Cant.) a young chicken U+239C2 laai5 (Cant.) untidy U+23CB7 nap6 (Cant.) sticky; not smooth; slow U+23CFC doe4 (Cant.) salivating U+241A3 saap6 (Cant.) to cook in boiling water U+24292 luk6 (Cant.) to scald with boiling water U+2430D hok3 (Cant.) to fry U+245C8 sip3 (Cant.) to squeeze in, to stuff in U+24674 caau1 (Cant.) gore U+2472F kap6 (Cant.) to bite U+24DB8 naa1 (Cant.) a scar U+24DC7 wak6 (Cant.) severe pain U+24DEA mang2 (Cant.) impatient, restless U+24DEB cek3 cik1 (Cant.) a prickling pain, ache U+24E3B naa1 (Cant.) a scar, scab; and, with U+24E50 lit3 (Cant.) a knot U+24EA7 zang2 (Cant.) annoyed U+24FC2 saai4 (Cant.) unattractive, pale U+24FEA zaap3 (Cant.) wrinkled, crumpled U+2502C jim2 (Cant.) a scar U+25052 ngaau4 (Cant.) warped U+2510E cik1 (Cant.) to pull, lift up U+2512B gap6 (Cant.) to stare, peep at U+25148 laap3 (Cant.) to look, scan U+25160 hau1 (Cant.) to fix one's eyes on, gaze at U+2517E zong1 (Cant.) to peek or peep at U+251E3 gwat6 (Cant.) to glance U+25232 kip1 (Cant.) to keep a close eye on, to control U+25236 nam6 (Cant.) sound asleep U+2528C caau4 (Cant.) wrinkled, folded, creased, crumpled U+25299 zong1 (Cant.) to peep at, look at secretly U+252C7 caang3 (Cant.) to open the eyes wide U+252D8 saau4 (Cant.) to swep the eyes over something U+2531B lai6 (Cant.) to gaze greedily at U+25531 sin3 (Cant.) to slip U+2553F ham2 (Cant.) classifier for cannons, large guns, etc. U+25945 lung1 (Cant.) a hole, hollow; cavity U+259F9 tam5 (Cant.) puddle U+25E49 nap6 (Cant.) sticky U+26097 sok3 (Cant.) to tighten U+260A5 dam3 (Cant.) to drop down U+26258 caang1 (Cant.) a cooking pot, cooker U+265BF dap1 (Cant.) to hang down; to lower one's head U+26629 paa4 (Cant.) chin U+26696 zaap3 (Cant.) to wink U+2688A pok1 (Cant.) blister U+26893 mak6 (Cant.) mole on skin U+26926 hot3 (Cant.) a smell, scent U+269F2 loe1 loe2 (Cant.) to dribble, spit; to pester, nag U+269FA laai2 laai5 (Cant.) to lick, lap up U+26A88 ngou3 (Cant.) to kneel U+26ED0 zaau3 (Cant.) to fry in oil U+27285 gwaai2 (Cant.) frog, toad U+272B6 doe3 (Cant.) insect sting U+272CA saa1 (Cant.) a large butterfly U+272E6 mei1 (Cant.) a dragonfly; a small boat without a sail U+27307 bang1 (Cant.) a large butterfly U+27574 naan3 (Cant.) a pimple, an insect bite U+27639 taai1 (Cant.) a necktie U+27685 long6 (Cant.) crotch U+27694 tung2 (Cant.) a kind of skirt U+2775E gei1 (Cant.) ???, khaki U+2789D lai6 (Cant.) to stare angrily U+278C8 caau1 (Cant.) to gore U+2797A kwan1 (Cant.) to fool, deceive, hoodwink U+279A0 ngaak1 (Cant.) to deceive U+279DD ngaa6 (Cant.) ????, to bar the way, obstruct U+27A0A zaa6 (Cant.) ????, to bar the way, obstruct U+27A3E tam3 (Cant.) to fool, trick, cheat U+27D2F me1 (Cant.) to carry on the back U+27D84 zaang1 (Cant.) to owe U+27ED9 mut6 (Cant.) ?????, not straightforward U+27FD2 dam6 (Cant.) to stamp (one's foot) U+27FEB tau2 (Cant.) to have a rest U+28023 kei2 (Cant.) a home, house U+28024 leoi1 (Cant.) to suddenly fall or drop down U+28048 gaang3 (Cant.) to ford, wade U+28090 leoi1 (Cant.) to suddenly fall or drop down U+280BD dam6 (Cant.) to stamp the foot U+280BE naam3 (Cant.) to step across U+280E9 sin3 (Cant.) to slip, slide U+2814F laam3 (Cant.) to step over, step across U+2815D jaang3 (Cant.) to press down or out with the foot; to kick; to tread on U+281AA jaang3 (Cant.) to press down or push out with the foot U+281AF buk6 (Cant.) to lie prone, bend over U+28207 laam3 (Cant.) to step over, step across U+28256 nei1 ni1 (Cant.) to hide oneself U+2827C wu3 (Cant.) to stoop, bow U+2829B laak3 (Cant.) nude, naked U+282CD wan1 wen1 (Cant.) a van U+282E2 lip1 (Cant.) an elevator (from the British 'lift') U+28B4C baang1 paang1 (Cant.) bang; pan (Eng. loanwords) U+294E5 ngok6 (Cant.) to raise the head U+295F4 bung6 (Cant.) classifier for odors U+29720 mam1 ngam1 (Cant.) soft rice for a small child U+2994B au6 ngau6 to gallop wildly; (Cant.) stupid U+29A4D peng1 (Cant.) ribs, rib-cage U+29B0E jam1 jam4 (Cant.) bangs (hair) U+2A400 naa1 (Cant.) relationship; together U+2A4AC nung1 (Cant.) burned U+2A601 kap6 (Cant.) to bite U+2A632 ji1 (Cant.) to grin, smile U+2A65B nak1 (Cant.) decayed teeth; tongue-tied U+2A6A9 gwi1 (Cant.) sound of shouting U+2F907 baan6 (Cant.) mud, mire From unicode at unicode.org Mon Nov 13 16:28:51 2017 From: unicode at unicode.org (James Kass via Unicode) Date: Mon, 13 Nov 2017 14:28:51 -0800 Subject: Plane-2-only string In-Reply-To: References: Message-ID: Peter Constable wrote, > We don't want to add BMP characters to the ExtB fonts. So the sample text would lack punctuation. Given that the Supplementary Ideographic Plane is composed of rare and historical characters from multiple sources, I suspect that the short answer to Peter's original question is: "No". From unicode at unicode.org Mon Nov 13 16:38:40 2017 From: unicode at unicode.org (Peter Constable via Unicode) Date: Mon, 13 Nov 2017 22:38:40 +0000 Subject: Plane-2-only string In-Reply-To: References: Message-ID: I discussed this with one of my Chinese co-workers, and we came up with the following: ??????????? ?????????? ?????????? ??????????? Factors in the choice of characters were: - different radicals - for a given radical, have a sequence of consecutive characters so people get the idea it's not a sentence but just a sequence of characters with related meanings - radical groups increase in complexity It's not a sentence that can be read, but there's an obvious pattern, so it's also not completely gibberish. Peter -----Original Message----- From: James Kass [mailto:jameskasskrv at gmail.com] Sent: Monday, November 13, 2017 2:29 PM To: Peter Constable Cc: Unicode list Subject: Re: Plane-2-only string Peter Constable wrote, > We don't want to add BMP characters to the ExtB fonts. So the sample text would lack punctuation. Given that the Supplementary Ideographic Plane is composed of rare and historical characters from multiple sources, I suspect that the short answer to Peter's original question is: "No". From unicode at unicode.org Mon Nov 13 16:54:03 2017 From: unicode at unicode.org (James Kass via Unicode) Date: Mon, 13 Nov 2017 14:54:03 -0800 Subject: Plane-2-only string In-Reply-To: References: Message-ID: Peter Constable wrote, > ??????????? > ?????????? > ?????????? > ??????????? > ??????????? ?????????? ?????????? ??????????? It looks good in blocks on four separate lines, but would a typical font viewing or comparison tool be expected to break it down into four lines? The pattern is still apparent if displayed on just one line, but separating the blocks with spaces or any punctuation would require BMP characters in the ExtB font. ?????????????????????????????????????????? From unicode at unicode.org Mon Nov 13 17:26:25 2017 From: unicode at unicode.org (Peter Constable via Unicode) Date: Mon, 13 Nov 2017 23:26:25 +0000 Subject: Plane-2-only string In-Reply-To: References: Message-ID: As mentioned in my initial mail, the fonts support the Basic Latin block from the BMP. Peter -----Original Message----- From: Unicode [mailto:unicode-bounces at unicode.org] On Behalf Of James Kass via Unicode Sent: Monday, November 13, 2017 2:54 PM To: Unicode list Subject: Re: Plane-2-only string Peter Constable wrote, > ??????????? > ?????????? > ?????????? > ??????????? > ??????????? ?????????? ?????????? ??????????? It looks good in blocks on four separate lines, but would a typical font viewing or comparison tool be expected to break it down into four lines? The pattern is still apparent if displayed on just one line, but separating the blocks with spaces or any punctuation would require BMP characters in the ExtB font. ?????????????????????????????????????????? From unicode at unicode.org Mon Nov 13 17:52:40 2017 From: unicode at unicode.org (Philippe Verdy via Unicode) Date: Tue, 14 Nov 2017 00:52:40 +0100 Subject: Plane-2-only string In-Reply-To: References: Message-ID: Any font would likely map the space (and probably for any CJK font the ideographic space). As well the newline don't need any font, it is synthetized by renderers. This could be used to compose some Japanese-like Aiku with some meaning... 2017-11-13 23:54 GMT+01:00 James Kass via Unicode : > Peter Constable wrote, > > > ??????????? > > ?????????? > > ?????????? > > ??????????? > > > > ??????????? ?????????? ?????????? ??????????? > > It looks good in blocks on four separate lines, but would a typical > font viewing or comparison tool be expected to break it down into four > lines? The pattern is still apparent if displayed on just one line, > but separating the blocks with spaces or any punctuation would require > BMP characters in the ExtB font. > > ?????????????????????????????????????????? > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From unicode at unicode.org Mon Nov 13 18:35:42 2017 From: unicode at unicode.org (James Kass via Unicode) Date: Mon, 13 Nov 2017 16:35:42 -0800 Subject: Plane-2-only string In-Reply-To: References: Message-ID: Philippe Verdy wrote, > ... As well the newline don't need any font, it is synthetized by renderers. It's true that fonts don't need to have glyphs mapped for control characters, but I'd hesitate to use any control character in a font's sample text field because of the field's intended use. But, we are being moot here since Peter has reminded that the fonts in question already have some BMP characters mapped, including certain punctuation characters. An ExtB font with BMP basic Latin could display the English language default sample text "The quick brown fox..." with no problem, but a non-English locale might substitute a default text string which the font could not support. So it's probably best to have *something* in that field respresenting characters the font covers. From unicode at unicode.org Mon Nov 13 21:39:54 2017 From: unicode at unicode.org (Phake Nick via Unicode) Date: Tue, 14 Nov 2017 11:39:54 +0800 Subject: Plane-2-only string In-Reply-To: References: Message-ID: Perhaps the http://en.wikipedia.org/wiki/Martian_language should be considered as a way to construct an example Chinese sentence from characters that are only within Plane2? Probably coukd be understand by more people than something Cantonese too -------------- next part -------------- An HTML attachment was scrubbed... URL: From unicode at unicode.org Mon Nov 13 21:23:40 2017 From: unicode at unicode.org (via Unicode) Date: Tue, 14 Nov 2017 11:23:40 +0800 Subject: Plane-2-only string In-Reply-To: References: Message-ID: <7281e80980d8d0a9b8c07371798530ca@koremail.com> With over a thousand Zhuang characters, Zhuang would work, though of course would not have punctuation. Of the top of my head something like:- ???????????? ???????????? ???????????? In romanised Zhuang:- Gou bae ranz gyoengqde gou youq ranz ndaw gwn haeux aen ranz baihlaeng miz naz In English:- I went to their house I ate a meal in the house behind the house were paddy fields A native speaker would of course do much better. Regards John Knightley From unicode at unicode.org Mon Nov 13 23:45:13 2017 From: unicode at unicode.org (via Unicode) Date: Tue, 14 Nov 2017 13:45:13 +0800 Subject: Plane-2-only string In-Reply-To: References: Message-ID: Dear Peter, since the Chinese characters below are meaningless in Chinese using them should not be a first choice, as they are meaningless, so gibberish, just not complete gibberish. Plane 2 has a fair number of older Chinese characters, so someone with a knowledge of ancient Chinese might well be able make something meaningful. Run a competition in China would be one way to get suggestions, spotting a good suggestion is easier than making one. Plane 2 has Cantonese, Vietnamese and Zhuang characters. The number of Cantonese characters is small, so making phrases using only them would be difficult. Both Vietnamese and Zhuang have a much larger number of characters so much easier to make something meaningful. The following Zhuang proverb, or saying ???????????????????? "Plant sweet potatoes in the field, and raise pigs in the sty."[lit: house, as the bottom floor of tradional house used for livestock and people live in floor above.] However third and eighth characters are not the most common used. Regards John On 14.11.2017 06:38, Peter Constable via Unicode wrote: > I discussed this with one of my Chinese co-workers, and we came up > with the following: > > ??????????? > ?????????? > ?????????? > ??????????? > > Factors in the choice of characters were: > - different radicals > - for a given radical, have a sequence of consecutive characters so > people get the idea it's not a sentence but just a sequence of > characters with related meanings > - radical groups increase in complexity > > > It's not a sentence that can be read, but there's an obvious pattern, > so it's also not completely gibberish. > > > Peter > > -----Original Message----- > From: James Kass [mailto:jameskasskrv at gmail.com] > Sent: Monday, November 13, 2017 2:29 PM > To: Peter Constable > Cc: Unicode list > Subject: Re: Plane-2-only string > > Peter Constable wrote, > >> We don't want to add BMP characters to the ExtB fonts. > > So the sample text would lack punctuation. Given that the > Supplementary Ideographic Plane is composed of rare and historical > characters from multiple sources, I suspect that the short answer to > Peter's original question is: "No". From unicode at unicode.org Mon Nov 13 23:45:53 2017 From: unicode at unicode.org (Tex via Unicode) Date: Mon, 13 Nov 2017 21:45:53 -0800 Subject: FW: Plane-2-only string i18nguy supplementary-test page Message-ID: <000a01d35d0b$d5fda670$81f8f350$@xencraft.com> I am the author of the supplementary-test page on i18nguy.com. The method for choosing the characters is described on the page, so isn?t a mystery. See below. I do not believe any of the characters are offensive, although context matters greatly and languages evolve, so it is possible that a character can gain an offensive meaning or usage at any time. Consider the humble eggplant? The page was created to offer values for testing supplementary characters with values that would justify fixing any problems they uncover. The values are probably not the best choice for demonstrating and marketing fonts, the usage Peter is looking for. Here is an excerpt from the page: In 2005, the IRG (Ideographic Rapporteur Group) identified a set ideographs, called the Ideographic International Core (IICore) . The 10,000 ideographs in the IICore are the most frequently used characters that would cover the vast majority of modern texts in all locales where ideographs are used. This collection is intended for use in devices with limited resources, such as mobile phones. Test Characters To have characters that are good for testing software support for the Supplementary Plane, I extracted the 62 characters from the IICORE that are in the Supplementary Plane. These characters have the properties that: ? Being in IICORE they are used frequently enough to be a minimum requirement for software supporting ideographs ? They are in the Supplementary Plane and will test support for code points above U+FFFF ? They are not "oddball" values. If using them uncovers a problem, fixing the problem is inherently justified. Tex From: Unicode [mailto:unicode-bounces at unicode.org] On Behalf Of Philippe Verdy via Unicode Sent: Monday, November 13, 2017 12:58 PM To: James Kass Cc: Peter Constable; Unicode list Subject: Re: Plane-2-only string 2017-11-13 21:48 GMT+01:00 James Kass : Peter Constable wrote, >> May be this test page ? >> >> http://www.i18nguy.com/unicode/supplementary-test.html > > Thanks. I?d need to know _at least something_ about what the characters > signify, though, to have a sense of whether there?s anything potentially > offensive. The Plane 2 characters on that page appear to be random. That's probable but the authors claim these are common characters. It's possible they collected statistics from some corpus to find some of the most widely used characters in Plane 2, without needing to understand what they would mean if they are put side by side (I had noted already that there was no punctuation at all, and the exposed collection is too long for a typical Chinese text, and in fact I would expect the presence of some CJK punctuations. May be we could compile a list of Chinese toponyms using these, and select those that use more than one Plane2 character, then separate these names using CJK commas and a final CJK full stop. Some Wikidata or OSM data search could be used to compile such list (I think these topynyms will more likely be found in Cantonese, or Taiwanese related sources, using the zh-Hant variant, but note that Wikidata does not distinguish zh-Hans and zh-Hant as Wikimedia wikis use a transliterator, but I doubt this transliterator performs transforms with Plane2 characters which should remain unchanged with most of them kept for both traditional and simplified use). -------------- next part -------------- An HTML attachment was scrubbed... URL: From unicode at unicode.org Tue Nov 14 02:04:09 2017 From: unicode at unicode.org (Bobby Tung via Unicode) Date: Tue, 14 Nov 2017 16:04:09 +0800 Subject: Plane-2-only string In-Reply-To: References: Message-ID: <662D1539-BA8A-4237-BA66-7857FF4C8A8E@wanderer.tw> Hello, Here's a list of frequently used Han characters for Hakka and Minnan, Chinese dialects. It contains several EXT-B characters that you can test: http://bobbytung.github.io/TaigiHakkaIdeograph/ https://docs.google.com/spreadsheets/d/18CUbZ7tsvZ4QbUj3xcfYi9EGqsft4T37WtUMX9v2STQ/pubhtml Bobby Tung W3C invited expert Editor of CLREQ > via Unicode ? 2017?11?14? ??1:45 ??? > > Dear Peter, > > since the Chinese characters below are meaningless in Chinese using them should not be a first choice, as they are meaningless, so gibberish, just not complete gibberish. > > Plane 2 has a fair number of older Chinese characters, so someone with a knowledge of ancient Chinese might well be able make something meaningful. Run a competition in China would be one way to get suggestions, spotting a good suggestion is easier than making one. > > Plane 2 has Cantonese, Vietnamese and Zhuang characters. The number of Cantonese characters is small, so making phrases using only them would be difficult. Both Vietnamese and Zhuang have a much larger number of characters so much easier to make something meaningful. > > The following Zhuang proverb, or saying > > ???????????????????? > > "Plant sweet potatoes in the field, and raise pigs in the sty."[lit: house, as the bottom floor of tradional house used for livestock and people live in floor above.] > > However third and eighth characters are not the most common used. > > Regards > John > > > > > > > > > > > > > > > On 14.11.2017 06:38, Peter Constable via Unicode wrote: >> I discussed this with one of my Chinese co-workers, and we came up >> with the following: >> >> ??????????? >> ?????????? >> ?????????? >> ??????????? >> >> Factors in the choice of characters were: >> - different radicals >> - for a given radical, have a sequence of consecutive characters so >> people get the idea it's not a sentence but just a sequence of >> characters with related meanings >> - radical groups increase in complexity >> >> >> It's not a sentence that can be read, but there's an obvious pattern, >> so it's also not completely gibberish. >> >> >> Peter >> >> -----Original Message----- >> From: James Kass [mailto:jameskasskrv at gmail.com] >> Sent: Monday, November 13, 2017 2:29 PM >> To: Peter Constable >> Cc: Unicode list >> Subject: Re: Plane-2-only string >> >> Peter Constable wrote, >> >>> We don't want to add BMP characters to the ExtB fonts. >> >> So the sample text would lack punctuation. Given that the >> Supplementary Ideographic Plane is composed of rare and historical >> characters from multiple sources, I suspect that the short answer to >> Peter's original question is: "No". > -------------- next part -------------- An HTML attachment was scrubbed... URL: From unicode at unicode.org Thu Nov 30 13:58:02 2017 From: unicode at unicode.org (William_J_G Overington via Unicode) Date: Thu, 30 Nov 2017 19:58:02 +0000 (GMT) Subject: International Digital Preservation Day Message-ID: <23335823.63353.1512071882309.JavaMail.defaultUser@defaultHost> I have learned this evening (I am in England where it is nearly 8pm as I write this note) that today, Thursday 30 November 2017, is the first International Digital Preservation Day. I have searched on the web and found lots of links about International Digital Preservation Day. William Overington Thursday 30 November 2017