<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
</head>
<body>
<div class="moz-cite-prefix">I personally dislike the attempt of
even defining compatibility *character*. It is not really a
property that's inherent in the character, witness the fact that
there's no list.</div>
<div class="moz-cite-prefix"><br>
</div>
<div class="moz-cite-prefix">
<blockquote type="cite">vvvv
<br>
There is no formal listing of all compatibility characters in
the Unicode Standard. This follows from the nature of the
definition of compatibility characters. It is a judgement call
as to whether any particular character would have been accepted
for encoding if it had not been required for interoperability
with a particular standard. Different participants in character
encoding often disagree about the appropriateness of encoding
particular characters, and sometimes there are multiple
justifications for encoding a given character.
<br>
^^^^
</blockquote>
</div>
<div class="moz-cite-prefix">There's a compatibility principle that
can be used as an argument for encoding some particular
characters. But, as others have noted, there's not necessarily a
single argument for or against encoding any given character, and
the weighing of these can differ depending on who does it. The
only thing we know formally is that, in the end, there was a vote
or consensus behind adding that character.</div>
<div class="moz-cite-prefix"><br>
</div>
<div class="moz-cite-prefix">What we don't have is a recorded vote
of a formal classification of characters as compatibility
characters, although I'm sure that in some cases the minutes may
have recorded that the character was to be added for
compatibility.</div>
<div class="moz-cite-prefix"><br>
</div>
<div class="moz-cite-prefix">There are characters that violate or
appear to violate some of the other encoding principles; and if
they exist in other character sets, it's an easy presumption that
the compatibility argument was decisive in the decision to add
them.</div>
<div class="moz-cite-prefix"><br>
</div>
<div class="moz-cite-prefix">And the compatibility mappings are
equally a muddle, conflating the idea of a fallback with the
concept of "derived/derivable from" or with the suggestion of a
"preferred alternate".</div>
<div class="moz-cite-prefix"><br>
</div>
<div class="moz-cite-prefix">A./<br>
</div>
<div class="moz-cite-prefix"><br>
</div>
<div class="moz-cite-prefix"><br>
</div>
<div class="moz-cite-prefix">On 2/17/2022 5:52 AM, Giacomo Catenazzi
via Unicode wrote:<br>
</div>
<blockquote type="cite"
cite="mid:e90e5bb2-967e-b843-8919-5a547fa53944@cateee.net">Hello
Monica,
<br>
<br>
On 17.02.2022 13:18, Monica Merchant via Unicode wrote:
<br>
<br>
<blockquote type="cite">However, I'm confused by the second
example. In particular, I'm not sure if no-break space
(*U+00A0*) and the fixed-width space characters
(*U+2000-U+200A*) are compatibility characters or not. They are
described as "serving essential functions", which I read as
meaning that they would have been encoded even if it weren't for
round-tripping, in which case they would not be considered as
compatibility characters. Is this correct? If so, are they
essential because they facilitate the typesetting of text-based
markup like HTML (where formatting must be specified in plain
text)? No-break space is also essential in that it is used to
display standalone non-spacing marks (pg 267
<a class="moz-txt-link-rfc2396E" href="https://www.unicode.org/versions/Unicode14.0.0/ch06.pdf"><https://www.unicode.org/versions/Unicode14.0.0/ch06.pdf></a>).
<br>
<br>
</blockquote>
<br>
I read the section in this manner: the three examples before your
example 1 and example 2 describe the case of compatibility
characters that are not compatibility decomposable characters.
Then the standard describe two examples where we have
compatibility decomposition, but without being compatibility
characters.
<br>
<br>
Note that on page 26 we have:
<br>
<br>
vvvv
<br>
There is no formal listing of all compatibility characters in the
Unicode Standard. This follows from the nature of the definition
of compatibility characters. It is a judgement call as to whether
any particular character would have been accepted for encoding if
it had not been required for interoperability with a particular
standard. Different participants in character encoding often
disagree about the appropriateness of encoding particular
characters, and sometimes there are multiple justifications for
encoding a given character.
<br>
^^^^
<br>
<br>
So it depends on how do you interpret U+00A0. As you write, you
may consider essential distinction in HTML, so it may not be a
compatibility character. On the other hand, a typesetter may
interpret U+00A0 as U+0020. Such person will decide to break or
not the space according the context (he know language rules and
style, e.g. not to break number with units, "Ms." with the name,
etc.). So the context, but not the character makes the
distinction.
<br>
<br>
But your extra cases are more interesting.
<br>
U+2000 is canonical equivalent to U+2002 (EN QUAD vs EN SPACE).
These not just have a compatibility decomposable character, but in
my opinion they are also just compatibility characters: there are
exactly the same character (there are included just because an
error/wrong interpretation of existing documents). The same for
U+2001.
<br>
<br>
I would consider U+2002 to U+200A without U+2007 also as
compatibility characters (and Unicode Database considers them as
compatibility decomposable characters). Probably Unicode do the
same, because they have the type "<compat>".
<br>
<br>
It is just U+2007 (not just because like U+00A0 has a
<NoBreak> instead of <compat>) that make me think. For
me, this is just a decimal digit zero which it is not printed, so
it has own merits: it is not a separation, but a meaningful
character. (context: tables). Different people may have different
opinions.
<br>
<br>
giacomo
<br>
<br>
<br>
<blockquote type="cite">
<br>
<br>
Thank you,
<br>
<br>
Monica
<br>
<br>
<br>
</blockquote>
</blockquote>
<p><br>
</p>
</body>
</html>