Question mark

Giacomo Catenazzi cate at cateee.net
Tue Jun 11 00:29:25 CDT 2024



On 11.06.2024 03:38, David Starner via Unicode wrote:
> On Mon, Jun 10, 2024, 1:08 PM Piotr Karocki via Unicode 
> <unicode at corp.unicode.org <mailto:unicode at corp.unicode.org>> wrote:
> 
>     In my humble opinion, only "portable filename character set" should
>     be used
>     in filenames.
>     This is the one and only character set that is truly portable between
>     operating systems, works everywhere and everytime, without any
>     problems or
>     inconveniences.
> 
> That's just an obsolete relic of Unix-like systems. Virtually all 
> systems will survive with 8.3, upper case letters only and only one 
> period, splitting that 8 and 3. If you're worried about major current 
> operating systems, most ASCII characters are fine, but you need to 
> remember case-insensitivity. I'm not willing to live in that box, but 
> even if you are, there's no reason to treat that list as terribly useful.

The problem is not about operating systems, but file system, and if you 
follow StackOverflow and related sites you see many weird failures. For 
your personal files normally it is not a problem: do what you want. But 
you may get problems as the file touch a remote filesystem (network 
filesystem, including NAS, "the cloud", etc.), but also same filesystem 
on different computers (not necessary different operating system), like 
"usb pen" and other detachable hard-disks

Unix-like systems works usually on bytes (so encoded strings), so fully 
"transparent" to encoding, but also fully failure if one user uses a 
different encoding compared to others. Because Unix-like systems 
(including Apple) went quickly to UTF-8, we forget about it, so sometime 
we have weird bugs with Japan or normalization.

Microsoft uses code units, so if a filename contain invalid Unicode 
string (or just not writable), things may get difficult to access. 
(single surrogate is most common problem).

Typing or also selecting a file not in your locale may be difficult 
(also on a GUI: if you see just replacement character, which one is a 
text file, and which it is a virus?)

PS: Microsoft has over restriction in filename. "Com" is not allowed.

And you get surprises if you uses some characters like space or & or +: 
they may work fine until you work "in the cloud" (but the cloud add much 
more restriction on filenames and paths).

And just looking for the official solution of the initial question: 
microsoft can do a sort of escaping, for filesystem compatibility, but I 
cannot find anymore the translation list, but in meantime I discovered 
problems using control characters in filenames, and other weird things: 
we are far from ideal case where we can use Unicode characters (maybe 
without control characters) freely in filenames.

So, not useful the "portable character set", but it is still necessary. 
Windows is still not fully committed to Unicode, so lazy programmers 
makes visible i18n bugs.

giacomo



More information about the Unicode mailing list