New CJK characters
James Kass
jameskass at code2001.com
Wed Nov 3 16:59:57 CDT 2021
On 2021-11-03 9:22 PM, Mark E. Shoulson via Unicode wrote:
> There's frequently more than one way to slice a character up. Should
> *all* be supported? Should there be some way to decide the
> "canonical" decomposition?
Take U+68DA "棚", which can be given IDS of "⿰木朋" or "⿰木⿰月月".
Entering either into the Zi tool gets the character. Entering the
latter results in the tool showing a "normalized IDS" which is the
former. It appears that the tool is, of necessity, performing its own
"roll up" of the sequences in order to perform look-ups.
Then there's unification issues. For example, this recently added
Extension G character:
U+31310 𱌐 ^⿰鼠𠔥$(G) ^⿺鼠𠔥$(Z)
...the tool generates fine ideographs for both IDS. But only the first
IDS is being recognized by the tool as a valid Unicode character.
Then there's regional preferences of component glyph shapes to consider,
and I don't know how or if that would be addressed.
IDS are useful for expressing unencoded ideographs in plain-text, not
only for those rare older characters, but also for newly invented ones.
(Sorry for my earlier misperception about the identity of the tool's
developer.)
More information about the Unicode
mailing list