Re: Odp: RE: What to do if a legacy compatibility character is defective?

piotrunio-2004@wp.pl piotrunio-2004 at wp.pl
Thu Dec 4 16:37:29 CST 2025


Dnia 04 grudnia 2025 21:28 Asmus Freytag via Unicode <unicode at corp.unicode.org> napisał(a):  On 12/4/2025 4:35 AM,   piotrunio-2004 at wp.pl  via Unicode wrote:  I have investigated the situation further and it seems
              that defect in the Unicode 13.0—17.0 mapping is even more
              fundamental than I previously thought. In particular, the
              proposal L2/25-037 does not acknowledge the proposal
              L2/00-159, which had already been incorporated into
              Unicode 3.2. In that proposal, the description of
              characters U+23B8 (LEFT VERTICAL BOX LINE) and U+23B9
              (RIGHT VERTICAL BOX LINE) exactly matches the proposed
              characters L2/25-037:1FBFC (BOX DRAWINGS LIGHT LEFT EDGE)
              and L2/25-037:1FBFD (BOX DRAWINGS LIGHT RIGHT EDGE). In
              both proposals, those two characters are specified to be
              aligned to left or right edge, span the entire edge
              (extending to the top and bottom), and match the thickness
              of Box Drawings Light lines. The description of the
              characters U+23BA (HORIZONTAL SCAN LINE-1) and U+23BD
              (HORIZONTAL SCAN LINE-9) also exactly matches the proposed
              characters L2/25-037:1FBFA (BOX DRAWINGS LIGHT TOP EDGE)
              and L2/25-037:1FBFB (BOX DRAWINGS LIGHT BOTTOM EDGE). In
              both proposals, those two characters are specified to be
              aligned to top and bottom edges, span the entire edge
              (extending to the left and right), and match the thickness
              of Box Drawings Light lines. However, the proposal
              L2/00-159 had already set precedent for usage of [U+23BA,
              U+23BD, U+23B8, U+23B9] (and not the 1÷8 blocks or 1÷4
              blocks) in mapping to certain platforms such as The
              Heath/Zenith 19 Graphics Character Set and The DEC Special
              Graphics Character Set. This contrasts with the usage of
              1÷8 blocks [U+2594, U+2581, U+258F, U+2595] and other
              related 1÷8 or 7÷8 block characters in the mapping to
              PETSCII and Apple II.  Therefore there
                    is a discrepancy between the legacy platforms added
                    in Unicode 3.2 (which use the box drawing lines
                    23B8, 23B9, 23BA, 23BD) and the legacy platforms
                    added in Unicode 13.0—17.0 (which use 1÷8 blocks
                    2594, 2581, 258F, 2595).   Dnia 25 października 2025 10:27   piotrunio-2004 at wp.pl  via Unicode   <unicode at corp.unicode.org>  napisał(a):   Dnia 25 października 2025 08:29 Asmus
                                Freytag via Unicode   <unicode at corp.unicode.org>  napisał(a):  Again, the identity of the
                                        Unicode character is giving by
                                        encoding the intended mappings.
                                        If Unicode decides to map the
                                        same character to similar
                                        characters on different
                                        platforms, that is not a
                                        problem, as long as implementers
                                        know that the intent is to use a
                                        platform-specific rendering (and
                                        not assume that there is only
                                        one possible rendering per
                                        character).  If you feel that the guidance
                                        available to implementers in the
                                        text of the standard or in an
                                        annotation of the nameslist is
                                        not sufficent, then the remedy
                                        would be to ask for the
                                        explanation to be updated. We
                                        are unfortunately locked in as
                                        far as character names are
                                        concerned, but we can add a note
                                        (best in the text of the
                                        standard) that explains that
                                        emulators for some systems will
                                        need an adjusted design so a
                                        sequence or other arrangement of
                                        these characters looks correct.  Indeed the character names cannot be
                              changed due to stability policies. An
                              explanation note has been provided for
                              U+1FB81 that claims "The lines
                              corresponding to 3 and 5 are not
                              actually block elements, but can show any
                              horizontally
                              repeating pattern", but still implicitly
                              enforces 1÷8 blocks for top and bottom.
                              However, this doesn't address other cases
                              such as the PETSCII C64 variation. And
                              if 1FB70—1FB81 1FBB5—1FBB8 1FBBC were all
                              noted to no longer require exact 1÷8
                              blocks, that would also not remedy the
                              issue because it would introduce an
                              inconsistency with the existing 1÷8 or 7÷8
                              block characters 2581 2589 258F 2594—2595,
                              which already have established
                              compatibility precedents that require the
                              exact fraction, but are also used in the
                              Unicode 13.0 mapping to PETSCII and Apple
                              II character sets despite those platforms
                              using varying thickness (consistent with
                              light box drawings, except for the 1÷8 top
                              and bottom blocks in C64, where the 1÷4
                              top and bottom blocks are made consistent
                              instead).      What is missing is an actual proposal. That
        is, not just analysis or exposition, but actual proposed wording
        or proposed encoding that would fix the issue.  That would need to be provided as a UTC
        document (aka L2 document) submission, with the analysis
        appended in a background section.   A./  PS: I am not convinced that
        platform-specific mappings (glyphs) are an issue, because the
        scenario where these data are reliably transferred *between*
        legacy implementations can't have existed then, so it's
        questionably why it needs to be perfect today. My assumption
        would be that the use case is lossless round trip from (each)
        legacy emulator to Unicode and back. Having PETSII / Apple II
        specific characters does not improve things, because any data
        stream containing those could not be displayed on any other
        emulator. This is different from legacy characters mapped to
        letters and common text symbols because we have an expectation
        that we can share text across devices (or emulators).   I have a draft of a follow up of L2/25-037 that analyzes the character sets thoroughly with the additional context provided by L2/00-159 characters (including the particularly complex relationship between box drawings, 1÷8 blocks, and 1÷4 blocks in PETSCII), provides additional explanation and screenshot of evidence of HP 264x character in both isolated and in connected usage, and arrives at the conclusion that 23 characters (that is, all in L2/25-037 except for the 4 that were already added by L2/00-159) should be added. However, the SEW announced that they will not be discussing these characters any further, so how could any follow up of the proposal possibly get incorporated into Unicode?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://corp.unicode.org/pipermail/unicode/attachments/20251204/f6d75029/attachment-0001.htm>


More information about the Unicode mailing list