New CJK characters
jameskass at code2001.com
Wed Nov 3 16:59:57 CDT 2021
On 2021-11-03 9:22 PM, Mark E. Shoulson via Unicode wrote:
> There's frequently more than one way to slice a character up. Should
> *all* be supported? Should there be some way to decide the
> "canonical" decomposition?
Take U+68DA "棚", which can be given IDS of "⿰木朋" or "⿰木⿰月月".
Entering either into the Zi tool gets the character. Entering the
latter results in the tool showing a "normalized IDS" which is the
former. It appears that the tool is, of necessity, performing its own
"roll up" of the sequences in order to perform look-ups.
Then there's unification issues. For example, this recently added
Extension G character:
U+31310 𱌐 ^⿰鼠𠔥$(G) ^⿺鼠𠔥$(Z)
...the tool generates fine ideographs for both IDS. But only the first
IDS is being recognized by the tool as a valid Unicode character.
Then there's regional preferences of component glyph shapes to consider,
and I don't know how or if that would be addressed.
IDS are useful for expressing unencoded ideographs in plain-text, not
only for those rare older characters, but also for newly invented ones.
(Sorry for my earlier misperception about the identity of the tool's
More information about the Unicode