Can NFKC turn valid UAX 31 identifiers into non-identifiers?

Richard Wordingham via Unicode unicode at unicode.org
Thu Jun 7 08:29:42 CDT 2018


On Thu, 7 Jun 2018 10:42:46 +0200
Mark Davis ☕️ via Unicode <unicode at unicode.org> wrote:

> > The proposal also asks for identifiers to be treated as equivalent
> > under  
> NFKC.
> 
> The guidance in #31 may not be clear. It is not to replace
> identifiers as typed in by the user by their NFKC equivalent. It is
> rather to internally *identify* two identifiers (as typed in by the
> user) as being the same. For example, Pascal had case-insensitive
> identifiers. That means someone could type in
> 
> myIdentifier = 3;
> MyIdentifier = 4;
> 
> And both of those would be references to the same internal entity. So
> cases like SARA AM doesn't necessarily play into this.

There has been a suggestion to not just restrict identifiers to NFKC
equivalence classes (UAX31-R4), but to actually restrict them to NFKC
form (UAX31-R6).  That is where the issue with SARA AM changes from a
lurking issue to an active problem.  Others have realised that NFC
makes more sense than NFKC for Rust.

Richard.




More information about the Unicode mailing list