UAX 31 for C++ Identifiers

Tom Honermann tom at honermann.net
Sat Jun 20 15:36:01 CDT 2020


On 6/20/20 2:44 AM, Asmus Freytag (c) via Unicode wrote:
> My meta point had been about possibly different levels security issues 
> between compile time and runtime.
> A./

When you mentioned "modules", were you referring to C++20 modules?  If 
so, there may be some confusion; C++20 modules is a compile-time feature 
with no run-time component.

Tom.

>
> On 6/19/2020 8:22 PM, Steve Downey wrote:
>> On Fri, Jun 19, 2020 at 10:44 PM Asmus Freytag via Unicode
>> <unicode at unicode.org>  wrote:
>>> In source code, having ambiguous identifiers may not be worse than C-style obfuscation.
>>>
>> Until recently (the last release 10.1), gcc rejected much of allowed
>> unicode in UTF-8 input, even in places it would allow \u
>> universal-character-names. So this all becomes easier now. As a
>> Standard, we should have handled this better earlier, but the second
>> best time is now. The XID_ properties make this a lot more palatable
>> w.r.t. stability, though, and I'm not going to second guess people 10
>> or 20 or more years ago, too much. Ambiguity in external identifiers
>> is already ill-formed no diagnostic required, which means broken but
>> in ways that compilers can't treat as undefined.
>>
>>> But with module names, etc. you may run into security issues if naming allows / facilitates spoofing.
>>>
>> I, and other people doing tools, both won and lost this battle
>> already. Module names in source do not correspond with anything
>> physical. `import some.module` connects you to whatever exported
>> `some.module` by magic as far as the standard is concerned. We're
>> working on the actual mechanics as a Technical Report, and compiler
>> vendors are participating and aren't, as far as I can tell, more
>> insane than the average infrastructure engineer. So I have hope.
>>
>> Mapping anything to file paths is fraught beyond belief, and there are
>> many experienced engineers providing war stories and parades of
>> horribles, although I'd personally like to have more stories to tell.
>>
>> The entire disconnect between logical and physical actually is
>> hopeful, in a way that `#include <ha/hahahahaha.h>` isn't. Even though
>> we have a lot of understanding of how that maps to filesystem
>> searches.
>>
>> Province of wg21/sg15 , which I also participate in.
>>
>> I suspect that trying to fix up anything with #include is infeasible
>> since it's currently the wild west, changes will break, and C++
>> depends in practice on system provided headers that at best conform to
>> old C standards.
>>
>> Thanks!
>>
>> -SMD
>
>



More information about the Unicode mailing list