What to do if a legacy compatibility character is defective?

Peter Constable pgcon6 at msn.com
Fri Oct 24 13:42:23 CDT 2025


TLDR…

You start by saying what your message is _NOT_ about. It would be helpful to have a brief abstract of what it _IS_ about so people can decide whether to read a long email.


Peter

From: Unicode <unicode-bounces at corp.unicode.org> On Behalf Of piotrunio-2004 at wp.pl via Unicode
Sent: October 24, 2025 5:42 AM
To: unicode <unicode at corp.unicode.org>
Subject: What to do if a legacy compatibility character is defective?

No, I'm not talking about U+0149, which was marked as deprecated but is in fact a legitimate compatibility character and is not defective as it is the only reasonable way to represent the byte 0xF3 in a CP853 character cell.

I am aware that this issue has already been discussed many times before on this mailing list, but I still did not receive a proper explanation of how exactly the existing characters 1FB70—1FB81 1FBB5—1FBB8 1FBBC are intended to be used in the context of certain legacy computing platforms. As it is, I consider those characters defective.

For context, in L2/25-037, I have identified a fundamental defect in how PETSCII, Apple II, and HP 264x characters were encoded in Unicode. The box drawing characters (which depend on the typeface weight) and the block elements (which depend on fractions of bounding box size) were unified with each other, which in some cases contradicted the source legacy platforms. The Unicode 13.0 mapping table of PETSCII and Apple II characters relied on the assumption that the thickness of light box drawing characters is equal to 1÷8 of the width or height of the character. This assumption is incorrect in case of C64 version of PETSCII (where the thickness is 1÷4 of the width and height) and in Apple II (where the thickness is 1÷7 of the width and 1÷8 of the height). In case of HP 264x, two of the characters that were unified to the same Unicode character were identified to have not only distinct glyphs but also distinct types of box drawing connections, and both characters occur within the same encoding, leaving the Unicode mapping incomplete.

The response to this proposal in L2/25-010 is fundamentally logically incorrect and does not provide any feedback whatsoever. In that response, terms like 'differences in plain text', 'glyph distinctions', 'character identities' or 'appropriate fonts' are thrown around as buzzwords, completely defying all logic. The proposal already thoroughly explains why the Unicode 13.0—17.0 mapping is defective and why the proposed characters have a completely different identity from existing characters, which also makes it impossible to resolve with appropriate fonts.

However, what makes this especially problematic is that some of the Unicode characters were encoded for compatibility with legacy platforms, but the fundamental character identity that the characters were encoded with is not compatible with the original identity of the characters in the source platform.

The characters 1FB70—1FB7F, according to the L2/19-025 compatibility table (19025-aux-LegacyComputingSources.pdf), were encoded for compatibility with PETSCII, but their character identity as specified in Unicode is defined in terms of 1÷8 blocks. This already makes the characters incompatible with C64 version of PETSCII. The characters also fit into the 1÷8 blocks encoded in 2581 258F 2594—2595, but as PETSCII includes both light box drawings and fractions of blocks, and those characters is where the two groups of characters 'intersect', causing the true top/bottom (but not left/right) light box drawings to be mapped to different values, as I already thoroughly explained in L2/25-037. However, the PETSCII character 0x5D is mapped to both U+2502 and U+1FB73, and the PETSCII character 0x40 is mapped to both U+2500 and U+1FB79. However, in legacy computing text modes, all of character tiles have a 1∶1 mapping to a fixed size region of the screen, and all the tiles are independent from each other, so it makes no sense whatsoever to use multiple Unicode characters to represent the same legacy character. In the context of both PET/VIC20 and C64 versions of PETSCII, the characters representing horizontal and vertical lines match the thickness of the common light box drawing characters, and do not match 1÷8 blocks in C64, therefore it is inappropriate to identify them as a set of 1÷8 blocks. Similarly for Apple II compatibility characters 1FB7C 1FB80—1FB81 1FBB5—1FBB8 1FBBC, which are also defective for reasons I explained in L2/25-037. Some of those characters are also used in other platforms (across both 13.0 and 16.0), which I haven't analyzed thoroughly but also have similar issues.

Therefore, 1FB70—1FB81 1FBB5—1FBB8 1FBBC are defective, because their character identity mismatches that of the original characters on the source platforms. The Unicode 16.0 change of character identity of U+1FB81 does not resolve the issue either as it makes the third and fifth blocks unspecified but still enforces 1÷8 blocks on top and bottom. This also cannot be resolved by changing the identity of those characters to light box drawings or unspecified thickness because it would violate the consistency with 2581 258F 2594—2595 and disrupt implementations that rely on that consistency. And forget about contextual substitutions and other overcomplicated mechanisms, because they're completely irrelevant in the context of a grid of independent character tiles.

Relating to the L2/25-010 claims that this issue 'can be solved by using appropriate fonts', in case of PETSCII PET/VIC20, the source platform font does in fact match the character identities of the Unicode mapping. In case of Apple II, the source platform could be considered to match the character identities if the left and right 1÷8 blocks are rounded to 1 pixel in the width of 7 pixels, but it makes no sense for the character identities to hinge on platform-specific rounding when there is already a consistent light box drawing thickness to work with. In case of PETSCII C64, the source platform font mismatches the character identities of the Unicode mapping, making it impossible to resolve using 'appropriate fonts'. In case of HP 264x, the source platform font has two different glyphs for two different character identities in the same encoding for the same Unicode character, which also makes it impossible to resolve using 'appropriate fonts'. So how is anyone ever supposed to use those characters in the context of PETSCII C64 or HP 264x encoding?


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://corp.unicode.org/pipermail/unicode/attachments/20251024/696aa414/attachment-0001.htm>


More information about the Unicode mailing list