Can NFKC turn valid UAX 31 identifiers into non-identifiers?

Mark Davis ☕️ via Unicode unicode at unicode.org
Thu Jun 7 08:47:21 CDT 2018


Got it, thanks.

Mark

On Thu, Jun 7, 2018 at 3:29 PM, Richard Wordingham via Unicode <
unicode at unicode.org> wrote:

> On Thu, 7 Jun 2018 10:42:46 +0200
> Mark Davis ☕️ via Unicode <unicode at unicode.org> wrote:
>
> > > The proposal also asks for identifiers to be treated as equivalent
> > > under
> > NFKC.
> >
> > The guidance in #31 may not be clear. It is not to replace
> > identifiers as typed in by the user by their NFKC equivalent. It is
> > rather to internally *identify* two identifiers (as typed in by the
> > user) as being the same. For example, Pascal had case-insensitive
> > identifiers. That means someone could type in
> >
> > myIdentifier = 3;
> > MyIdentifier = 4;
> >
> > And both of those would be references to the same internal entity. So
> > cases like SARA AM doesn't necessarily play into this.
>
> There has been a suggestion to not just restrict identifiers to NFKC
> equivalence classes (UAX31-R4), but to actually restrict them to NFKC
> form (UAX31-R6).  That is where the issue with SARA AM changes from a
> lurking issue to an active problem.  Others have realised that NFC
> makes more sense than NFKC for Rust.
>
> Richard.
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20180607/dd78948e/attachment.html>


More information about the Unicode mailing list