Feedback on the proposal to change U+FFFD generation when decoding ill-formed UTF-8

Alastair Houghton via Unicode unicode at
Tue May 16 11:13:33 CDT 2017

On 16 May 2017, at 17:07, Hans Åberg <haberg-1 at> wrote:
>>>> HFS(+), NTFS and VFAT long filenames are all encoded in some variation on UCS-2/UTF-16. ...
>>> The filesystem directory is using octet sequences and does not bother passing over an encoding, I am told. Someone could remember one that to used UTF-16 directly, but I think it may not be current.
>> No, that’s not true.  All three of those systems store UTF-16 on the disk (give or take).
> I am not speaking about what they store, but how the filesystem identifies files.

Well, quite clearly none of those systems treat the UTF-16 strings as binary either - they’re case insensitive, so how could they?  HFS+ even normalises strings using a variant of a frozen version of the normalisation spec.

Kind regards,



More information about the Unicode mailing list