Can NFKC turn valid UAX 31 identifiers into non-identifiers?

Alastair Houghton via Unicode unicode at
Wed Jun 6 04:49:01 CDT 2018

On 5 Jun 2018, at 07:09, Martin J. Dürst via Unicode <unicode at> wrote:
> Hello Rebecca,
> On 2018/06/05 12:43, Rebecca T via Unicode wrote:
>> Something I’d love to see is translated keywords; shouldn’t be hard with a
>> line in the cargo.toml for a ruidmentary lookup. Again, I’m of the opinion
>> that an imperfect implementation is better than no attempt. I remember
>> reading an article about a professor who translated the keywords in...
>> maybe it was Python? And found their students were much more engaged with
>> the material. Anecdotal, of course, but it’s stuck with me.
> It would be good to have a reference for this. I can certainly see the point. But on the other hand, I have also heard that using keywords in a foreign language makes it clear that there may be a difference between the everyday use of the word and the specific formal meaning in the programming language. Then, there's also the problem that just translating keywords may work for languages with the same sentence structure, but not for languages with a completely different sentence structure. On top of that, keywords are just a start; class/function/method names in libraries would have to be translated, too, which would be much more work (especially if one wants to do a good job).

ALGOL68 was apparently localised (the standard explicitly supported that; it wasn’t an extension but rather something explicitly encouraged).  AppleScript was also designed to be (French and Japanese syntaxes were defined), and I have an inkling that someone once told me that at least one translation had actually shipped, though the translated variants are now deprecated as far as I’m aware.

Translated keywords are in some ways better than allowing non-ASCII identifiers, because they’re typically amenable to machine translation (indeed, in AppleScript, the scripts are not usually saved in ASCII anyway, but IIRC as a set of Apple Event Descriptors, so the “language” is just a matter for rendering to the user), which means that they don’t suffer from the problem of community fragmentation that non-ASCII identifiers *could* cause.

Kind regards,



More information about the Unicode mailing list