<html>

  <head>

    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">

  </head>

  <body>

    <div class="moz-cite-prefix">I personally dislike the attempt of

      even defining compatibility *character*. It is not really a

      property that's inherent in the character, witness the fact that

      there's no list.</div>

    <div class="moz-cite-prefix"><br>

    </div>

    <div class="moz-cite-prefix">

      <blockquote type="cite">vvvv

        <br>

        There is no formal listing of all compatibility characters in

        the Unicode Standard. This follows from the nature of the

        definition of compatibility characters. It is a judgement call

        as to whether any particular character would have been accepted

        for encoding if it had not been required for interoperability

        with a particular standard. Different participants in character

        encoding often disagree about the appropriateness of encoding

        particular characters, and sometimes there are multiple

        justifications for encoding a given character.

        <br>

        ^^^^

      </blockquote>

    </div>

    <div class="moz-cite-prefix">There's a compatibility principle that

      can be used as an argument for encoding some particular

      characters. But, as others have noted, there's not necessarily a

      single argument for or against encoding any given character, and

      the weighing of these can differ depending on who does it. The

      only thing we know formally is that, in the end, there was a vote

      or consensus behind adding that character.</div>

    <div class="moz-cite-prefix"><br>

    </div>

    <div class="moz-cite-prefix">What we don't have is a recorded vote

      of a formal classification of characters as compatibility

      characters, although I'm sure that in some cases the minutes may

      have recorded that the character was to be added for

      compatibility.</div>

    <div class="moz-cite-prefix"><br>

    </div>

    <div class="moz-cite-prefix">There are characters that violate or

      appear to violate some of the other encoding principles; and if

      they exist in other character sets, it's an easy presumption that

      the compatibility argument was decisive in the decision to add

      them.</div>

    <div class="moz-cite-prefix"><br>

    </div>

    <div class="moz-cite-prefix">And the compatibility mappings are

      equally a muddle, conflating the idea of a fallback with the

      concept of "derived/derivable from" or with the suggestion of a

      "preferred alternate".</div>

    <div class="moz-cite-prefix"><br>

    </div>

    <div class="moz-cite-prefix">A./<br>

    </div>

    <div class="moz-cite-prefix"><br>

    </div>

    <div class="moz-cite-prefix"><br>

    </div>

    <div class="moz-cite-prefix">On 2/17/2022 5:52 AM, Giacomo Catenazzi

      via Unicode wrote:<br>

    </div>

    <blockquote type="cite"

      cite="mid:e90e5bb2-967e-b843-8919-5a547fa53944@cateee.net">Hello

      Monica,

      <br>

      <br>

      On 17.02.2022 13:18, Monica Merchant via Unicode wrote:

      <br>

      <br>

      <blockquote type="cite">However, I'm confused by the second

        example. In particular, I'm not sure if no-break space

        (*U+00A0*) and the fixed-width space characters

        (*U+2000-U+200A*) are compatibility characters or not. They are

        described as "serving essential functions", which I read as

        meaning that they would have been encoded even if it weren't for

        round-tripping, in which case they would not be considered as

        compatibility characters. Is this correct? If so, are they

        essential because they facilitate the typesetting of text-based

        markup like HTML (where formatting must be specified in plain

        text)? No-break space is also essential in that it is used to

        display standalone non-spacing marks (pg 267

        <a class="moz-txt-link-rfc2396E" href="https://www.unicode.org/versions/Unicode14.0.0/ch06.pdf"><https://www.unicode.org/versions/Unicode14.0.0/ch06.pdf></a>).

        <br>

        <br>

      </blockquote>

      <br>

      I read the section in this manner: the three examples before your

      example 1 and example 2 describe the case of compatibility

      characters that are not compatibility decomposable characters.

      Then the standard describe two examples where we have

      compatibility decomposition, but without being compatibility

      characters.

      <br>

      <br>

      Note that on page 26 we have:

      <br>

      <br>

      vvvv

      <br>

      There is no formal listing of all compatibility characters in the

      Unicode Standard. This follows from the nature of the definition

      of compatibility characters. It is a judgement call as to whether

      any particular character would have been accepted for encoding if

      it had not been required for interoperability with a particular

      standard. Different participants in character encoding often

      disagree about the appropriateness of encoding particular

      characters, and sometimes there are multiple justifications for

      encoding a given character.

      <br>

      ^^^^

      <br>

      <br>

      So it depends on how do you interpret U+00A0. As you write, you

      may consider essential distinction in HTML, so it may not be a

      compatibility character. On the other hand, a typesetter may

      interpret U+00A0 as U+0020. Such person will decide to break or

      not the space according the context (he know language rules and

      style, e.g. not to break number with units, "Ms." with the name,

      etc.). So the context, but not the character makes the

      distinction.

      <br>

      <br>

      But your extra cases are more interesting.

      <br>

      U+2000 is canonical equivalent to U+2002 (EN QUAD vs EN SPACE).

      These not just have a compatibility decomposable character, but in

      my opinion they are also just compatibility characters: there are

      exactly the same character (there are included just because an

      error/wrong interpretation of existing documents). The same for

      U+2001.

      <br>

      <br>

      I would consider U+2002 to U+200A without U+2007 also as

      compatibility characters (and Unicode Database considers them as

      compatibility decomposable characters). Probably Unicode do the

      same, because they have the type "<compat>".

      <br>

      <br>

      It is just U+2007 (not just because like U+00A0 has a

      <NoBreak> instead of <compat>) that make me think. For

      me, this is just a decimal digit zero which it is not printed, so

      it has own merits: it is not a separation, but a meaningful

      character. (context: tables). Different people may have different

      opinions.

      <br>

      <br>

      giacomo

      <br>

      <br>

      <br>

      <blockquote type="cite">

        <br>

        <br>

        Thank you,

        <br>

        <br>

        Monica

        <br>

        <br>

        <br>

      </blockquote>

    </blockquote>

    <p><br>

    </p>

  </body>

</html>