New CJK characters

James Kass jameskass at code2001.com
Wed Nov 3 16:59:57 CDT 2021



On 2021-11-03 9:22 PM, Mark E. Shoulson via Unicode wrote:
> There's frequently more than one way to slice a character up.  Should 
> *all* be supported?  Should there be some way to decide the 
> "canonical" decomposition?

Take U+68DA "棚", which can be given IDS of "⿰木朋" or "⿰木⿰月月". 
Entering either into the Zi tool gets the character.  Entering the 
latter results in the tool showing a "normalized IDS" which is the 
former.  It appears that the tool is, of necessity, performing its own 
"roll up" of the sequences in order to perform look-ups.

Then there's unification issues.  For example, this recently added 
Extension G character:
U+31310    𱌐    ^⿰鼠𠔥$(G)    ^⿺鼠𠔥$(Z)
...the tool generates fine ideographs for both IDS.  But only the first 
IDS is being recognized by the tool as a valid Unicode character.

Then there's regional preferences of component glyph shapes to consider, 
and I don't know how or if that would be addressed.

IDS are useful for expressing unencoded ideographs in plain-text, not 
only for those rare older characters, but also for newly invented ones.

(Sorry for my earlier misperception about the identity of the tool's 
developer.)



More information about the Unicode mailing list