Pd: Odp: RE: What to do if a legacy compatibility character is defective?

Asmus Freytag asmusf at ix.netcom.com
Fri Oct 24 17:26:53 CDT 2025


Fundamentally when Unicode "unifies" characters it often does so "across 
sources". For example, any ordinary ASCII letters are unified across 
character sets, even if some legacy platform shows a somewhat  different 
pixel arrangement for some letter compared to some other platform.

The most common reason for Unicode to disunify characters relates to the 
*same* source showing both.

These same considerations apply to compatibility characters.

The primary goal for encoding any compatibility characters is to allow 
round-trip of data from the source with systems operating in Unicode and 
vice versa. It is a non-goal to be able to tell from the Unicode 
character code which legacy platform the character was mapped from or is 
being mapped to.

The required evidence to support a request for disunification therefore 
always consists of a document (screenshot) (usually other than a 
character set table) that shows that the two characters are distinct in 
their source environment and that that distinction matters (for example, 
that it can't be determined mechanically by context).

 From the original document (section 1, page 1), it looks like that 
there are two characters that are distinct in the source, but have been 
mapped to the same Unicode character 1CE2B. I can certainly sympathize 
with the view that unifying these based on their close visual similarity 
was, what we used to call a case of "arms-length" unification.

In this example, a character stream representing data encoding the 
pieces used in the representation of a particular run of text in the 
"large character mode" would not reliably round trip, and after 
round-tripping (with a real device), the displayed characters would look 
subtly different. For handling data being processed transiently using 
Unicode there would be a loss of round-tripping, resulting in a change 
in data stream without a change in contents, which is what compatibility 
characters are designed to normally avoid. For a live terminal emulator, 
the effect would be a small degradation of the fidelity of the 
emulation. There's no simple workaround as analyzing the fragments in 
what amounts to 2-D text display isn't without challenges.

I can understand the frustration of the submitter on being told that 
there's an arbitrary limitation on fidelity and some degradation should 
be seen as acceptable.  While visually not prominent, the disposition 
needlessly violates source separation for a single character.


For the examples involving block characters, it is unclear whether they 
involve issues of unification within a source or across sources. If the 
unification is across sources (platforms) then knowing the target 
platform can be used to adjust the glyph being displayed, and there is 
no issue. The same is true for any SHIFT mode in a source character set, 
because whether the device operates in the shifted mode or not has to be 
known and already affects what is displayed at some byte location in the 
source character set.

I cannot tell whether the Script Encoding disposition violates source 
separation or merely suggests reuse of character codes for multiple 
sources/modes in a way that may be amenable to disambiguation with 
additional, but available context information.

A./



More information about the Unicode mailing list