String Ranges in Unicode Sets

Doug Ewell doug at ewellic.org
Tue Sep 8 10:19:03 CDT 2015


Mark Davis ��️ <mark at macchiato dot com> wrote:

>> TUS 8.0 Chapter 3 C6: "A process shall not assume that the
>> interpretations of two canonical-equivalent character sequences are
>> distinct."
>
> ​A compiler will take source code containing String x="á"; and compile
> it to a certain binary. If that same source code is NFD'd, the
> compiler will produce a different result.
>
> Do you really think that such compiler is not compliant to Unicode??
> If so, then we should add some more clarifications around C6.

I agree. The word "interpretations" in C6 can't have been intended to
include the interpretation of code points qua code points. That would
make a great many internal processes impossible.

I think of C6 as meaning that spell-checkers, for example, should not
treat José (NFC, four code points) and José (NFD, five code points)
as separate entries.

--
Doug Ewell | http://ewellic.org | Thornton, CO ����




More information about the Unicode mailing list