Can NFKC turn valid UAX 31 identifiers into non-identifiers?

Hans Åberg via Unicode unicode at unicode.org
Fri Jun 8 07:39:09 CDT 2018


> On 8 Jun 2018, at 11:07, Henri Sivonen via Unicode <unicode at unicode.org> wrote:
> 
> My question is:
> 
> When designing a syntax where tokens with the user-chosen characters
> can't occur next to each other without some syntax-reserved characters
> between them, what advantages are there from limiting the user-chosen
> characters according to UAX #31 as opposed to treating any character
> that is not a syntax-reserved character as a character that can occur
> in user-named tokens?

It seems best to stick to the canonical forms and add the sequences one deems useful and safe, as treating inequivalent characters as equal is likely to be confusing. But this requires more work; it seems that the use of the compatibility forms is aimed at something simple to implement.





More information about the Unicode mailing list