Unicode fundamental character identity
Peter Constable
pgcon6 at msn.com
Fri Jan 31 10:14:00 CST 2025
AFAICT, you are the only one lobbying on this topic.
Peter
From: Unicode <unicode-bounces at corp.unicode.org> On Behalf Of piotrunio-2004 at wp.pl via Unicode
Sent: Friday, January 31, 2025 8:09 AM
To: piotrunio-2004 at wp.pl; unicode <unicode at corp.unicode.org>
Subject: Odp: Re: Re: Unicode fundamental character identity
Dnia 24 stycznia 2025 08:29 piotrunio-2004 at wp.pl<mailto:piotrunio-2004 at wp.pl> via Unicode <unicode at corp.unicode.org<mailto:unicode at corp.unicode.org>> napisał(a):
>Round trip compatibility (for HP 264x) ... should be enough evidence.
Your defense of L2/25-037 here depends on an assumption that round trip compatibility for HP 264x is a sufficient argument for encoding a distinction. This is equivalent to assuming source separation for HP 264x is a sufficient basis. But Unicode makes no such commitment to preserving source separation / round trip compatibility for HP 264x; the Standard is clear that commitments to source separation were scoped to major vendor and national standard encodings in use circa 1990. Implicit in the response in L2/25-010 is the view that source separation is not a factor in this case.
However, this isn't just about a duplicated character, but about a character that is visually distinct in the HP 264x source (even if it's a subtle difference) and has evidence of distinct usage (as it's observed to connect to different characters in example usage). This makes it plainly incorrect to encode them as the same character. The response in L2/25-010 claims that this can be solved by using appropriate fonts but it can't because an HP 264x Large Character set mode text document using the two different characters will have those characters appear differently in the source, but will appear the same no matter what when converted to Unicode with the current mapping.
I consider it that there is absolutely no reason why the two characters 0x12 and 0x18 of the HP 264x Large Character Set would be considered the same character in plain text.
obGQ4Ie.png (1440×720) (imgur.com)<https://i.imgur.com/obGQ4Ie.png>
The characters are:
* visually distinct within the same source font (which blatantly contradicts the L2/25-010 claims that it 'can be solved by using appropriate fonts')
* encoded differently in the source set
* typed differently on the keyboard
* connect to different characters above (and therefore have a fundamentally different box drawing identity as was already demonstrated in L2/25-037, which contradicts the L2/25-010 claims that there is 'No evidence of a document that would make a distinction')
In fact, I believe the arguments in L2/25-010 are so blatantly wrong, as I disproved them in multiple different ways, that I suspect there is some lobbying involved. This is potentially dangerous because of the possibility that lobbying will eventually affect the interpretation of stability policies, which would effectively result in actual compatibility breaking changes.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://corp.unicode.org/pipermail/unicode/attachments/20250131/ff081b9f/attachment.htm>
More information about the Unicode
mailing list