Unicode fundamental character identity

Tue Jan 21 15:01:27 CST 2025

In  www.unicode.org UTC Document Register 2025 (unicode.org) , I have submitted a new proposal which is  www.unicode.org L2/25-037 . However, when it came time for  www.unicode.org L2/25-010  to review it, they concluded that supposedly do not constitute differences in plain text and that the issues can be solved by using appropriate fonts.   In case of the HP 264x character set, the two characters are clearly shown to have distinct plain text usage and source character set encoding. There have already been countless precedents for distinct encodings within the same legacy codepage resulting in distinct Unicode encodings. This cannot be solved by using appropriate fonts as they are two different characters from the same character set, and therefore the font would have to include both characters in order to correctly represent their HP 264x usage.   In case of PETSCII and Apple II character set, the character identities of box drawings are fundamentally different than that of 1÷8 blocks. This is because the edge box drawings are based on strokes and are therefore dependent on the font weight, whereas the 1÷8 blocks are tied to a specific proportion of the bounding box. The fact that some of the mappings to 1÷8 blocks actually have thickness equivalent to 1÷4 bounding box in C64 or 1÷7 bounding box horizontally in Apple II indicates that the mapping to 1÷8 blocks does not correctly represent the fundamental character identity, and therefore this also cannot be solved by using appropriate fonts because 1÷8 blocks cannot possibly be made consistent with stems of other thickness than 1÷8 of bounding box.   Now let's contrast that with the proposal   www.unicode.org L2/23-252  for disunifying symbols from emoji, that supposedly got accepted for Unicode 17.0. In my opinion, those symbols do not have an actual fundamental typographical distinction, but a superficial one based on the emoji ideology, as in plain text it doesn't matter whether a character is referred to as an emoji or not. If every single Unicode character was referred to as an emoji, that wouldn't matter to plain text, let alone semigraphical text. I'm not demanding that L2/23-252 be cancelled, but I'm astonished as to how emoji ideology gets the distinction precedence over the actual typographical distinction of stem weight versus 1÷8 bounding box.   So, deep down, what please is the Unicode fundamental character identity that allows characters to be considered distinct just because they are identified as emoji, but also falsely unifies strokes to exact proportion of 1÷8 bounding box despite evident counter examples?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://corp.unicode.org/pipermail/unicode/attachments/20250121/e833f5dc/attachment.htm>