Unicode fundamental character identity
Asmus Freytag
asmusf at ix.netcom.com
Mon Feb 3 14:24:45 CST 2025
On 2/3/2025 9:36 AM, Sławomir Osipiuk via Unicode wrote:
> On Monday, 03 February 2025, 12:19:06 (-05:00), Peter Constable via
> Unicode wrote:
>
> As stated previously, Unicode makes no guarantee of supporting
> source separation / round-trip compatibility with HP264x.
>
>
> I'm honestly surprised by this. I always thought (because it was
> repeated so many times - must remember repetition does not equal
> truth) that round-trip compatibility with old character sets was a
> founding cornerstone of Unicode and so contrastive use (aka source
> separation) in an old charset would be persuasive evidence for inclusion.
You guys are talking past each other a bit.
Unicode decided early on to guarantee round-trip to important, widely
used character sets of the time. The key interest was to be able to
deploy software that worked internally in Unicode but could interface
with existing systems without incurring data loss in round trip.
This level guarantee does not exist for just any character set. It
didn't even exist for all character sets then in existence.
However, if conflating two characters causes a particular problem,
Unicode has accepted case-by-case requests not to unify them, or even to
disunify them. However, instead of applying a guarantee, the UTC will
look at a bit of a cost/benefit analysis, considering the cost of having
to encode additional characters (in perpetuity) vs. the benefit for the
intended users.
If this is a problem with a single character, I don't really buy the
cost savings argument, especially in a case where after adding some
extensions, a whole set could be matched. If there is a group involved,
the cost goes up.
On the other hand, I also would like to understand the benefit for the
supposed user group. Is it mainly that of avoiding a single pixel
infidelity in display only, or are these characters that would need to
round-trip, because they might be in data that is entered on a simulated
device, processed on a Unicode system and then output again.
I think it's stupid for both sides to fight over a single pixel. Yes, it
smells like a bad unification even though the character is arcane (but
so are others where minute details matter even though 'nobody' is likely
to use that character much). Having a stupidly incomplete mapping can be
frustrating, but is being unfaithful going to impact users in any
noticeable way?
A./
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://corp.unicode.org/pipermail/unicode/attachments/20250203/a8bd1bfc/attachment.htm>
More information about the Unicode
mailing list