Odp: RE: What to do if a legacy compatibility character is defective?

Sat Oct 25 01:18:21 CDT 2025

On 10/24/2025 10:54 PM, piotrunio-2004 at wp.pl wrote:
> Dnia 25 października 2025 00:38 Asmus Freytag via Unicode 
> <unicode at corp.unicode.org> napisał(a):
>
>     On 10/24/2025 2:58 PM, piotrunio-2004 at wp.pl
>     <mailto:piotrunio-2004 at wp.pl> via Unicode wrote:
>>     and not subject to font variation.
>
>     That's overstating things.
>
>     A./
>
> How is that overstating things?

Because exact glyph details are not normative. Especially for 
compatibility characters. Here, the intent is usually to facilitate a 
unique and unambiguous mapping between some kind of legacy character and 
a Unicode character.

I think that the analysis for the curved connectors that unified two 
distinct elements because their rendering was close was a mistake, 
because the distinction occurred in the same set and unifying the 
characters killed fidelity in round trip conversion for many members of 
the "large set" while "saving" only one character code. In my personal 
view, that's precisely the wrong way to do unification.

> When a legacy computing platform defines blocks in terms of fractions, 
> it does so to ensure specific alignment with those fractions, making 
> it part of the fundamental character identity. On the other hand, when 
> a legacy computing platform defines strokes in terms of stem weight 
> and there is known variation across platforms, it is inappropriate to 
> define those characters using exact fractions when those fractions 
> mismatch some of the platforms.

So far, you have only argued that a font (or bitmap) used to emulate a 
specific legacy platform should faithfully adhere to any specifications 
that apply to that platform.

There is nothing wrong with the same *Unicode* character being rendered 
slightly differently when used to emulate *different* platforms. Unless 
it is the very same platform that exhibits different shapes (and in the 
same display "mode" or "shift"). In that case, the principle of source 
set separation becomes applicable (which is the principle that should 
have been applied to the curved connector case. If it makes you happy, 
you can cite my opinion on that).

However, I didn't spot where that would have been the case for the line 
segments. From my quick perusal of the proposals and the critique here 
it seems that this is a matter of the different displays having 
different weights and therefore, the preferred font / bitmap cannot be 
the same in each context. However, there's not implied need to be able 
to emulate a screen where different parts of the emulator have support a 
different legacy system. Usually, a single window (or nested window) 
would display a single emulator.

Again, the identity of the Unicode character is giving by encoding the 
intended mappings. If Unicode decides to map the same character to 
similar characters on different platforms, that is not a problem, as 
long as implementers know that the intent is to use a platform-specific 
rendering (and not assume that there is only one possible rendering per 
character).

If you feel that the guidance available to implementers in the text of 
the standard or in an annotation of the nameslist is not sufficent, then 
the remedy would be to ask for the explanation to be updated. We are 
unfortunately locked in as far as character names are concerned, but we 
can add a note (best in the text of the standard) that explains that 
emulators for some systems will need an adjusted design so a sequence or 
other arrangement of these characters looks correct.

A./

PS: I see that you confirm below that the two cases are of a different 
nature.

> Dnia 25 października 2025 00:44 Asmus Freytag via Unicode 
> <unicode at corp.unicode.org> napisał(a):
>
>     On 10/24/2025 2:54 PM, Nitai Sasson via Unicode wrote:
>>     f you use a font that makes those Unicode characters look like
>>     they did on their original platform, there is no issue. But a
>>     given font can only emulate one platform at a time. You're not
>>     going to get a C64 and PET/VIC-20 frankenstein of a document.
>>     Take your pick: do you want it to look like C64, or do you want
>>     it to look like PET/VIC-20? Choose your font accordingly.
>
>     Round tripping plain text to a mix of devices is not a goal, just
>     as round tripping plain text Han characters to a mix of regional
>     variants is not a goal.
>
>     You (Piotr) need to demonstrate that for a single display, on a
>     single device or emulator for a single device, you cannot get the
>     correct appearance by systematically using a device appropriate font.
>
>     If a device supports "shifted" modes, then a device appropriate
>     font may change based on the shift status.
>
>     Only when that accommodation fails to produce the correct
>     appearance is there a case for further disunification.
>
>     The diagonal connector issue satisfies this requirement, but as
>     far as I have been able to understand, the block characters do not.
>
>     A./
>
>
> In case of PETSCII and Apple II characters, this is an instance of 
> source characters having an incompatible character identity from their 
> mapped Unicode characters. Therefore, there is a character identity 
> conflict between the legacy platform and the Unicode characters they 
> are mapped to.
> Whereas in case of HP 264x characters, two source characters having an 
> incompatible character identity from each other are mapped to the same 
> Unicode character. Therefore, there is a character identity conflict 
> between the two characters.
>
>     The required evidence to support a request for disunification
>     therefore
>     always consists of a document (screenshot) (usually other than a
>     character set table) that shows that the two characters are
>     distinct in
>     their source environment and that that distinction matters (for
>     example,
>     that it can't be determined mechanically by context).
>
>     From the original document (section 1, page 1), it looks like that
>     there are two characters that are distinct in the source, but have
>     been
>     mapped to the same Unicode character 1CE2B. I can certainly sympathize
>     with the view that unifying these based on their close visual
>     similarity
>     was, what we used to call a case of "arms-length" unification.
>
>
> As I have explained in Odp: Re: Unicode fundamental character identity 
> <https://corp.unicode.org/pipermail/unicode/2025-January/011312.html>, 
> This is what it looks like on a screenshot: 
> https://i.imgur.com/obGQ4Ie.png . The two different characters and 
> their different types of connections are demonstrated. Furthermore, 
> since all character tiles are visually independent, and both 
> characters may be used as isolated character cells, no contextual 
> mechanism can possibly apply.
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://corp.unicode.org/pipermail/unicode/attachments/20251024/380f15b6/attachment-0001.htm>