Unicode fundamental character identity

Thu Jan 23 01:05:59 CST 2025

>In my opinion, those symbols do not have an actual fundamental typographical distinction, but a superficial one based on the emoji ideology, as in plain text it doesn't matter whether a character is referred to as an emoji or not. If every single Unicode character was referred to as an emoji, that wouldn't matter to plain text,

As you said, this is your opinion; it is not objective fact.

Because the emoji characters are encoded distinctly from corresponding, similar non-emoji characters, then by definition they are distinct in plain text. What you are debating here is whether there was a need for them to be distinguished in plain text. Your opinion evidently is that there is not a need.

However, major vendors long ago provided convincing evidence to UTC that they do have such a need. That is why UTC not only approved the proposal in L2/23-252 to encode separate emoji character but also had specifically asked for that proposal to be written.

Peter

Get Outlook for Mac <https://aka.ms/GetOutlookForMac>

From: Unicode <unicode-bounces at corp.unicode.org> on behalf of piotrunio-2004 at wp.pl via Unicode <unicode at corp.unicode.org>
Date: Tuesday, January 21, 2025 at 1:08 PM
To: unicode <unicode at corp.unicode.org>
Subject: Unicode fundamental character identity

In UTC Document Register 2025 (unicode.org)<https://www.unicode.org/L2/L-curdoc.htm>, I have submitted a new proposal which is L2/25-037<https://www.unicode.org/L2/L2025/25037-legacy-box-drawing-disunification.pdf>. However, when it came time for L2/25-010<https://www.unicode.org/L2/L2025/25010-script-wg-report.pdf> to review it, they concluded that supposedly do not constitute differences in plain text and that the issues can be solved by using appropriate fonts.

In case of the HP 264x character set, the two characters are clearly shown to have distinct plain text usage and source character set encoding. There have already been countless precedents for distinct encodings within the same legacy codepage resulting in distinct Unicode encodings. This cannot be solved by using appropriate fonts as they are two different characters from the same character set, and therefore the font would have to include both characters in order to correctly represent their HP 264x usage.

In case of PETSCII and Apple II character set, the character identities of box drawings are fundamentally different than that of 1÷8 blocks. This is because the edge box drawings are based on strokes and are therefore dependent on the font weight, whereas the 1÷8 blocks are tied to a specific proportion of the bounding box. The fact that some of the mappings to 1÷8 blocks actually have thickness equivalent to 1÷4 bounding box in C64 or 1÷7 bounding box horizontally in Apple II indicates that the mapping to 1÷8 blocks does not correctly represent the fundamental character identity, and therefore this also cannot be solved by using appropriate fonts because 1÷8 blocks cannot possibly be made consistent with stems of other thickness than 1÷8 of bounding box.

Now let's contrast that with the proposal L2/23-252<https://www.unicode.org/L2/L2023/23252-legacy-disunification.pdf> for disunifying symbols from emoji, that supposedly got accepted for Unicode 17.0. In my opinion, those symbols do not have an actual fundamental typographical distinction, but a superficial one based on the emoji ideology, as in plain text it doesn't matter whether a character is referred to as an emoji or not. If every single Unicode character was referred to as an emoji, that wouldn't matter to plain text, let alone semigraphical text. I'm not demanding that L2/23-252 be cancelled, but I'm astonished as to how emoji ideology gets the distinction precedence over the actual typographical distinction of stem weight versus 1÷8 bounding box.

So, deep down, what please is the Unicode fundamental character identity that allows characters to be considered distinct just because they are identified as emoji, but also falsely unifies strokes to exact proportion of 1÷8 bounding box despite evident counter examples?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://corp.unicode.org/pipermail/unicode/attachments/20250123/c199e5f7/attachment-0001.htm>