Odp: RE: Re: Re: Unicode fundamental character identity

Mon Feb 3 11:19:06 CST 2025

As stated previously, Unicode makes no guarantee of supporting source separation / round-trip compatibility with HP264x. A proposal for encoding a distinction would need to provide a different line of argumentation for a need to encode.

Peter

From: Unicode <unicode-bounces at corp.unicode.org> On Behalf Of piotrunio-2004 at wp.pl via Unicode
Sent: Friday, January 31, 2025 2:28 PM
To: James Kass <jameskass at code2001.com>; unicode <unicode at corp.unicode.org>
Subject: Re: Odp: RE: Re: Re: Unicode fundamental character identity

Dnia 31 stycznia 2025 22:08 James Kass via Unicode <unicode at corp.unicode.org<mailto:unicode at corp.unicode.org>> napisał(a):
On 2025-01-31 5:42 PM, piotrunio-2004 at wp.pl<mailto:piotrunio-2004 at wp.pl> via Unicode wrote:
I'm not saying that it is, but if I'm relying on arguments relevant to
the actual usage of the characters, and the dominant opposing side
does not provide all that much coherent of a reasoning in return, then
I'm getting suspicious.

Doug and Peter have provided some good advice with respect to such
suspicions.  My apologies to Peter Constable for my failure to
understand exactly what was being dismissed earlier in this thread.

Quoting from https://www.unicode.org/L2/L2025/25010-script-wg-report.pdf

"We deem the differences demonstrated in the proposal to not constitute
differences in plain text. No evidence of a document that would make a
distinction between the corresponding characters in the different code
pages was provided."

That seems coherent enough.  If a simple and concise exhibit can be made
showing the desired distinction making a difference in plain text, then
that would be a logical next step.  Evidence illustrating data loss in
round-tripping would also be helpful.  Input from the user community
supporting retaining distinctions in Unicode should help the effort.

The proposal L2/25-037 already shows a difference in plain text of the HP 264x characters, where 0x12 (2) connects below vertical or perpendicular diagonal, whereas 0x18 (8) connects below diagonal of same direction. Those are different types of connections which is a plain text distinction of box drawings.

Data loss in round-tripping is implicitly evident from the information provided in the proposal: if an HP 264x Large Character set mode document has the characters 0x12 0x18, it converts to Unicode as U+1CE2B U+1CE2B, which converted back to HP 264x Large Character set mode is 0x12 0x12, which loses the distinction between the two characters and will appear slightly differently than the original document on HP 264x platform.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://corp.unicode.org/pipermail/unicode/attachments/20250203/f5a10e80/attachment-0001.htm>