UAX 31 for C++ Identifiers

Asmus Freytag (c) asmusf at
Sat Jun 20 23:38:49 CDT 2020

On 6/20/2020 1:36 PM, Tom Honermann wrote:
> On 6/20/20 2:44 AM, Asmus Freytag (c) via Unicode wrote:
>> My meta point had been about possibly different levels security 
>> issues between compile time and runtime.
>> A./
> When you mentioned "modules", were you referring to C++20 modules?  If 
> so, there may be some confusion; C++20 modules is a compile-time 
> feature with no run-time component.
> Tom.

I had been thinking of interfaces to the various OSs, like dynamically 
linked libraries, etc. that are usually named with identifiers of some 
sort. Although to the language proper, these may just be strings, of course.


>> On 6/19/2020 8:22 PM, Steve Downey wrote:
>>> On Fri, Jun 19, 2020 at 10:44 PM Asmus Freytag via Unicode
>>> <unicode at>  wrote:
>>>> In source code, having ambiguous identifiers may not be worse than 
>>>> C-style obfuscation.
>>> Until recently (the last release 10.1), gcc rejected much of allowed
>>> unicode in UTF-8 input, even in places it would allow \u
>>> universal-character-names. So this all becomes easier now. As a
>>> Standard, we should have handled this better earlier, but the second
>>> best time is now. The XID_ properties make this a lot more palatable
>>> w.r.t. stability, though, and I'm not going to second guess people 10
>>> or 20 or more years ago, too much. Ambiguity in external identifiers
>>> is already ill-formed no diagnostic required, which means broken but
>>> in ways that compilers can't treat as undefined.
>>>> But with module names, etc. you may run into security issues if 
>>>> naming allows / facilitates spoofing.
>>> I, and other people doing tools, both won and lost this battle
>>> already. Module names in source do not correspond with anything
>>> physical. `import some.module` connects you to whatever exported
>>> `some.module` by magic as far as the standard is concerned. We're
>>> working on the actual mechanics as a Technical Report, and compiler
>>> vendors are participating and aren't, as far as I can tell, more
>>> insane than the average infrastructure engineer. So I have hope.
>>> Mapping anything to file paths is fraught beyond belief, and there are
>>> many experienced engineers providing war stories and parades of
>>> horribles, although I'd personally like to have more stories to tell.
>>> The entire disconnect between logical and physical actually is
>>> hopeful, in a way that `#include <ha/hahahahaha.h>` isn't. Even though
>>> we have a lot of understanding of how that maps to filesystem
>>> searches.
>>> Province of wg21/sg15 , which I also participate in.
>>> I suspect that trying to fix up anything with #include is infeasible
>>> since it's currently the wild west, changes will break, and C++
>>> depends in practice on system provided headers that at best conform to
>>> old C standards.
>>> Thanks!
>>> -SMD

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

More information about the Unicode mailing list