New CJK characters

Peter Constable pgcon6 at msn.com
Wed Nov 3 12:40:37 CDT 2021


Something to consider: While highlighting potential benefits in relation to characters that are used only very rarely (in general-there might be local exceptions for some place names), you don't mention the problems that would be created for the vast majority of much-more-frequently used ideographs, as well as the down-sides for those rare characters. For example, the IDS scheme would never be supported in IDNA, so that town name could never be used in a domain name.


Peter

From: Unicode <unicode-bounces at corp.unicode.org> On Behalf Of Abraham Gross via Unicode
Sent: Tuesday, November 2, 2021 6:03 PM
To: unicode at corp.unicode.org
Subject: New CJK characters

I have a proposal regarding the future of encoding new Unihan characters into Unicode that I'd like to float by this group to see if it makes any sense.

New CJK characters keep on being encoded and it doesn't seem to be slowing down. This is to the point where there are now in unicode 92,856 CJK characters!

I think that going forward, it would make a lot of sense if instead of encoding each new character as a separate codepoint, we adopt a paradigm like that of Sutton SIgnwriting <https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fen.wikipedia.org%2Fwiki%2FSutton_SignWriting_(Unicode_block)&data=04%7C01%7C%7C56e383ef29d04433831808d99e66be5c%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637714986385950571%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=Mn2%2Bhkgt9CeUxH7jLPCp%2F6mU6LbPdqaQdm1JBT1EDBI%3D&reserved=0> - where Unicode would provide a set of all radicals and position/sizing modifiers - and anyone that wants to use any arbitrary non-encoded character would be able to just combine the radicals the right way (by using a gui designed for this, à la glyphwiki.org's or Wenlin's editor), and then be able to use the character right away. This would work because the font would have to support for all the basic strokes, and since all CJK characters are comprised of the basic strokes, the font will be able to put the character together without the need for a font maker to specifically create that character.

This method of "encoding" would solve many problems we have now:

  1.  Non encoded characters can be used without the need to wait years for the character to be accepted into Unicode, and then a couple more years until the major OSes update their fonts to support the new characters.
  2.  This is in my opinion a really neat solution to the gaiji problem (described here<https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fen.wikipedia.org%2Fwiki%2FOpenType%23SING_gaiji_solution&data=04%7C01%7C%7C56e383ef29d04433831808d99e66be5c%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637714986385960565%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=x2pfzaVjYj4uEO772mWLtK7C124YlzfPcJXJzGt%2BMCc%3D&reserved=0>).
  3.  This would also give way to much more rapid font development time, since you'd only need to create the basic strokes and some radicals to get a working version of the font, then all other characters would just be refining the exact stroke size/positioning.
  4.  Most CJK fonts only have a small subset of all available characters. This will allow for all fonts to support any character you wish - including ones you dream up.
  5.  People have been coming up with new CJK characters for thousands of years, including nowadays (here's a new-kanji competition for example<https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fsousaku-kanji.com%2Farchive.html&data=04%7C01%7C%7C56e383ef29d04433831808d99e66be5c%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637714986385960565%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=qq5EI%2FRGpX7IkdBTIRotux%2Bn1AFaltlQKvLxi1%2Buv3I%3D&reserved=0>), but any new characters created nowadays would be extremely hard to get into Unicode since Unicode requires proof of use before they accept a proposal, but how are people supposed to use a character if they can't type it.
I still think that Unicode should keep track of new characters in a Nameslist of sorts so that font makers have a base to go off of.

Q: My (city) name has a character that isn't encoded. How can I type it quickly without needing to open up an editor and creating it each time?
A: Adding them to your IME's dictionary would allow you to just create the character once.
- This can be extended in such a way where an IME can be fully formed entirely out of preconstructed characters instead of codepoints.

Q: What would the specifics of such a system look like behind the scenes?
A: I'm not sure yet, but I think Wenlin's CDL<https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fguide.wenlininstitute.org%2Fwenlin4.3%2FCharacter_Description_Language&data=04%7C01%7C%7C56e383ef29d04433831808d99e66be5c%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637714986385970559%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=W8fUMF0C3NRVHwCTnMESXkd3p5CylkTCIdg5PZweO90%3D&reserved=0> would be a good place to start.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://corp.unicode.org/pipermail/unicode/attachments/20211103/4012532e/attachment-0001.htm>


More information about the Unicode mailing list