Unicode Properties and Canonical Equivalence

Markus Scherer markus.icu at gmail.com
Mon Aug 15 13:38:24 CDT 2022


On Thu, Aug 11, 2022 at 10:21 PM Richard Wordingham via Unicode <
unicode at corp.unicode.org> wrote:

> May a process conforming to Unicode requirement C6 (TUS Section 3.2),
> "A process shall not assume that the interpretations of two
> canonical-equivalent character sequences are distinct", consider the
> Unicode set
>
> [\p{sc = Greek}&&\p{sc ≠ Greek}]
>
> to be non-empty?
>

Regardless of other considerations, a set and its inverse are disjoint.

The problem is that the canonically equivalent characters U+00B4 ACUTE
> ACCENT and U+1FFD GREEK OXIA have conflicting script properties, but a
> Unicode-conformant process may freely interchange the two characters
> whenever they appear as part of a string (Conformance Requirement C7).
> This conflict was allowed to stand in Consensus 113-C16 back in 2007,
> pending further study.
>

Would you mind providing the information that you have already collected?
Such as the script property values for these characters, and what that 2007
consensus says and what it was based on; and which value you think we
should change to what other value.

Thanks,
markus
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://corp.unicode.org/pipermail/unicode/attachments/20220815/cf28a8f2/attachment.htm>


More information about the Unicode mailing list