New CJK characters
Mark E. Shoulson
mark at kli.org
Wed Nov 3 16:22:58 CDT 2021
I'm waiting for some of the old-timers here to give a proper answer,
Unicode history-wise.
As I understood it, the idea of using IDS or something similar for CJK
characters was considered (probably more than once) and it was decided
to do things this way, and so that's the way we're doing them.
A font wouldn't necessarily have to be able to generate new hanzi
dynamically from IDS descriptions; it could have all the 100,000 or
however many glyphs already there, and just render the known ones like
ligatures or something. It means it's still up to font-designers to add
characters when they're needed, but the list of characters is then
open-ended and it's up to font-designers to decide what they want to
support.
OTOH, as is well known, IDS descriptions are not unique. There's
frequently more than one way to slice a character up. Should *all* be
supported? Should there be some way to decide the "canonical"
decomposition? I guess if we're leaving it up to fonts, it's then up to
the font designers again, but that would break all the non-font uses of
Unicode (searching, comparing, etc) unless there is some canonical
representation.
I don't know if IDS sequences can really represent "all" han characters;
I'd guess probably not, but there are probably more sophisticated
systems that can do better. There'll probably always be corner cases,
though.
But at any rate, it's my understanding that that particular ship has
already sailed, and atomic CJK characters is how Unicode does stuff.
Changing that now would be rather more disrupting than just saying "no
more precomposed accented letters."
On 11/2/21 21:03, Abraham Gross via Unicode wrote:
> I have a proposal regarding the future of encoding new Unihan
> characters into Unicode that I'd like to float by this group to see if
> it makes any sense. ....
~mark
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://corp.unicode.org/pipermail/unicode/attachments/20211103/5fd33afd/attachment.htm>
More information about the Unicode
mailing list