Feedback on the proposal to change U+FFFD generation when decoding ill-formed UTF-8

Hans Åberg via Unicode unicode at
Tue May 16 10:44:54 CDT 2017

> On 16 May 2017, at 17:30, Alastair Houghton via Unicode <unicode at> wrote:
> On 16 May 2017, at 14:23, Hans Åberg via Unicode <unicode at> wrote:
>> You don't. You have a filename, which is a octet sequence of unknown encoding, and want to deal with it. Therefore, valid Unicode transformations of the filename may result in that is is not being reachable.
>> It only matters that the correct octet sequence is handed back to the filesystem. All current filsystems, as far as experts could recall, use octet sequences at the lowest level; whatever encoding is used is built in a layer above. 
> HFS(+), NTFS and VFAT long filenames are all encoded in some variation on UCS-2/UTF-16. ...

The filesystem directory is using octet sequences and does not bother passing over an encoding, I am told. Someone could remember one that to used UTF-16 directly, but I think it may not be current.

More information about the Unicode mailing list