From pgcon6 at msn.com Mon Feb 3 11:19:06 2025 From: pgcon6 at msn.com (Peter Constable) Date: Mon, 3 Feb 2025 17:19:06 +0000 Subject: Odp: RE: Re: Re: Unicode fundamental character identity In-Reply-To: <073374f2aafd4f259cf3db508996bf1f@grupawp.pl> References: <709a2fd4e2bd4d2295655ee9431bf09a@grupawp.pl> <2342e8f045c94172af46545b8c1fae4e@grupawp.pl> <5e701b17-444b-16d2-9bed-817f354d7fdb@unicode.org> <41f332d8a568413088782d9d0982715f@grupawp.pl> <3396e2f2cb394320b8d90d7650a38b55@grupawp.pl> <25c51d1e-9002-4be2-9d6b-8b5e5a53beae@code2001.com> <073374f2aafd4f259cf3db508996bf1f@grupawp.pl> Message-ID: As stated previously, Unicode makes no guarantee of supporting source separation / round-trip compatibility with HP264x. A proposal for encoding a distinction would need to provide a different line of argumentation for a need to encode. Peter From: Unicode On Behalf Of piotrunio-2004 at wp.pl via Unicode Sent: Friday, January 31, 2025 2:28 PM To: James Kass ; unicode Subject: Re: Odp: RE: Re: Re: Unicode fundamental character identity Dnia 31 stycznia 2025 22:08 James Kass via Unicode > napisa?(a): On 2025-01-31 5:42 PM, piotrunio-2004 at wp.pl via Unicode wrote: I'm not saying that it is, but if I'm relying on arguments relevant to the actual usage of the characters, and the dominant opposing side does not provide all that much coherent of a reasoning in return, then I'm getting suspicious. Doug and Peter have provided some good advice with respect to such suspicions. My apologies to Peter Constable for my failure to understand exactly what was being dismissed earlier in this thread. Quoting from https://www.unicode.org/L2/L2025/25010-script-wg-report.pdf "We deem the differences demonstrated in the proposal to not constitute differences in plain text. No evidence of a document that would make a distinction between the corresponding characters in the different code pages was provided." That seems coherent enough. If a simple and concise exhibit can be made showing the desired distinction making a difference in plain text, then that would be a logical next step. Evidence illustrating data loss in round-tripping would also be helpful. Input from the user community supporting retaining distinctions in Unicode should help the effort. The proposal L2/25-037 already shows a difference in plain text of the HP 264x characters, where 0x12 (2) connects below vertical or perpendicular diagonal, whereas 0x18 (8) connects below diagonal of same direction. Those are different types of connections which is a plain text distinction of box drawings. Data loss in round-tripping is implicitly evident from the information provided in the proposal: if an HP 264x Large Character set mode document has the characters 0x12 0x18, it converts to Unicode as U+1CE2B U+1CE2B, which converted back to HP 264x Large Character set mode is 0x12 0x12, which loses the distinction between the two characters and will appear slightly differently than the original document on HP 264x platform. -------------- next part -------------- An HTML attachment was scrubbed... URL: From sosipiuk at gmail.com Mon Feb 3 11:36:18 2025 From: sosipiuk at gmail.com (=?UTF-8?Q?S=C5=82awomir_Osipiuk?=) Date: Mon, 03 Feb 2025 17:36:18 +0000 Subject: Unicode fundamental character identity In-Reply-To: References: Message-ID: <1738603804156.1426487909.77361119@gmail.com> On Monday, 03 February 2025, 12:19:06 (-05:00), Peter Constable via Unicode wrote: As stated previously, Unicode makes no guarantee of supporting source separation / round-trip compatibility with HP264x. I'm honestly surprised by this. I always thought (because it was repeated so many times - must remember repetition does not equal truth) that round-trip compatibility with old character sets was a founding cornerstone of Unicode and so contrastive use (aka source separation) in an old charset would be persuasive evidence for inclusion. -------------- next part -------------- An HTML attachment was scrubbed... URL: From pgcon6 at msn.com Mon Feb 3 12:46:18 2025 From: pgcon6 at msn.com (Peter Constable) Date: Mon, 3 Feb 2025 18:46:18 +0000 Subject: Unicode fundamental character identity In-Reply-To: <1738603804156.1426487909.77361119@gmail.com> References: <1738603804156.1426487909.77361119@gmail.com> Message-ID: Source separation for round-trip compatibility was a principle applied circa 1990 for compatibility with widely-used standards at that time. Today, source separation is not a sufficient criterion for encoding distinctions in other legacy character sets. It can be provided as part of the evidence in a proposal, but other evidence would be required as for any new character proposal, in particular that a text element cannot be adequately represented using any existing character sequences and that there is a significant user community requiring public, plain-text interchange. Peter From: S?awomir Osipiuk Sent: February 3, 2025 10:36 AM To: Peter Constable ; Peter Constable via Unicode ; piotrunio-2004 at wp.pl; James Kass Subject: Re: Unicode fundamental character identity On Monday, 03 February 2025, 12:19:06 (-05:00), Peter Constable via Unicode wrote: As stated previously, Unicode makes no guarantee of supporting source separation / round-trip compatibility with HP264x. I'm honestly surprised by this. I always thought (because it was repeated so many times - must remember repetition does not equal truth) that round-trip compatibility with old character sets was a founding cornerstone of Unicode and so contrastive use (aka source separation) in an old charset would be persuasive evidence for inclusion. -------------- next part -------------- An HTML attachment was scrubbed... URL: From asmusf at ix.netcom.com Mon Feb 3 14:24:45 2025 From: asmusf at ix.netcom.com (Asmus Freytag) Date: Mon, 3 Feb 2025 12:24:45 -0800 Subject: Unicode fundamental character identity In-Reply-To: <1738603804156.1426487909.77361119@gmail.com> References: <1738603804156.1426487909.77361119@gmail.com> Message-ID: <603706d9-30e1-4cfc-9ff7-e58856825440@ix.netcom.com> On 2/3/2025 9:36 AM, S?awomir Osipiuk via Unicode wrote: > On Monday, 03 February 2025, 12:19:06 (-05:00), Peter Constable via > Unicode wrote: > > As stated previously, Unicode makes no guarantee of supporting > source separation / round-trip compatibility with HP264x. > > > I'm honestly surprised by this. I always thought (because it was > repeated so many times - must remember repetition does not equal > truth) that round-trip compatibility with old character sets was a > founding cornerstone of Unicode and so contrastive use (aka source > separation) in an old charset would be persuasive evidence for inclusion. You guys are talking past each other a bit. Unicode decided early on to guarantee round-trip to important, widely used character sets of the time. The key interest was to be able to deploy software that worked internally in Unicode but could interface with existing systems without incurring data loss in round trip. This level guarantee does not exist for just any character set. It didn't even exist for all character sets then in existence. However, if conflating two characters causes a particular problem, Unicode has accepted case-by-case requests not to unify them, or even to disunify them. However, instead of applying a guarantee, the UTC will look at a bit of a cost/benefit analysis, considering the cost of having to encode additional characters (in perpetuity) vs. the benefit for the intended users. If this is a problem with a single character, I don't really buy the cost savings argument, especially in a case where after adding some extensions, a whole set could be matched. If there is a group involved, the cost goes up. On the other hand, I also would like to understand the benefit for the supposed user group. Is it mainly that of avoiding a single pixel infidelity in display only, or are these characters that would need to round-trip, because they might be in data that is entered on a simulated device, processed on a Unicode system and then output again. I think it's stupid for both sides to fight over a single pixel. Yes, it smells like a bad unification even though the character is arcane (but so are others where minute details matter even though 'nobody' is likely to use that character much). Having a stupidly incomplete mapping can be frustrating, but is being unfaithful going to impact users in any noticeable way? A./ -------------- next part -------------- An HTML attachment was scrubbed... URL: