New CJK characters

Mark E. Shoulson mark at kli.org
Wed Nov 3 16:22:58 CDT 2021


I'm waiting for some of the old-timers here to give a proper answer, 
Unicode history-wise.

As I understood it, the idea of using IDS or something similar for CJK 
characters was considered (probably more than once) and it was decided 
to do things this way, and so that's the way we're doing them.

A font wouldn't necessarily have to be able to generate new hanzi 
dynamically from IDS descriptions; it could have all the 100,000 or 
however many glyphs already there, and just render the known ones like 
ligatures or something.  It means it's still up to font-designers to add 
characters when they're needed, but the list of characters is then 
open-ended and it's up to font-designers to decide what they want to 
support.

OTOH, as is well known, IDS descriptions are not unique.  There's 
frequently more than one way to slice a character up.  Should *all* be 
supported?  Should there be some way to decide the "canonical" 
decomposition?  I guess if we're leaving it up to fonts, it's then up to 
the font designers again, but that would break all the non-font uses of 
Unicode (searching, comparing, etc) unless there is some canonical 
representation.

I don't know if IDS sequences can really represent "all" han characters; 
I'd guess probably not, but there are probably more sophisticated 
systems that can do better.  There'll probably always be corner cases, 
though.

But at any rate, it's my understanding that that particular ship has 
already sailed, and atomic CJK characters is how Unicode does stuff.  
Changing that now would be rather more disrupting than just saying "no 
more precomposed accented letters."

On 11/2/21 21:03, Abraham Gross via Unicode wrote:
> I have a proposal regarding the future of encoding new Unihan 
> characters into Unicode that I'd like to float by this group to see if 
> it makes any sense. ....
~mark
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://corp.unicode.org/pipermail/unicode/attachments/20211103/5fd33afd/attachment.htm>


More information about the Unicode mailing list