Feedback on the proposal to change U+FFFD generation when decoding ill-formed UTF-8

Alastair Houghton via Unicode unicode at
Tue May 16 10:52:00 CDT 2017

On 16 May 2017, at 16:44, Hans Åberg <haberg-1 at> wrote:
> On 16 May 2017, at 17:30, Alastair Houghton via Unicode <unicode at> wrote:
>> HFS(+), NTFS and VFAT long filenames are all encoded in some variation on UCS-2/UTF-16. ...
> The filesystem directory is using octet sequences and does not bother passing over an encoding, I am told. Someone could remember one that to used UTF-16 directly, but I think it may not be current.

No, that’s not true.  All three of those systems store UTF-16 on the disk (give or take).  On Windows, the “ANSI” APIs convert the filenames to or from the appropriate Windows code page, while the “Wide” API works in UTF-16, which is the native encoding for VFAT long filenames and NTFS filenames.  And, as I said, on Mac OS X and iOS, the kernel expects filenames to be encoded as UTF-8 at the BSD API, regardless of what encoding you might be using in your Terminal (this is different to traditional UNIX behaviour, where how you interpret your filenames is entirely up to you - usually you’d use the same encoding you were using on your tty).

Kind regards,



More information about the Unicode mailing list