Feedback on the proposal to change U+FFFD generation when decoding ill-formed UTF-8

Alastair Houghton via Unicode unicode at
Tue May 16 11:38:36 CDT 2017

On 16 May 2017, at 17:23, Hans Åberg <haberg-1 at> wrote:
> HFS implements case insensitivity in a layer above the filesystem raw functions. So it is perfectly possible to have files that differ by case only in the same directory by using low level function calls. The Tenon MachTen did that on Mac OS 9 already.

You keep insisting on this, but it’s not true; I’m a disk utility developer, and I can tell you for a fact that HFS+ uses a B+-Tree to hold its directory data (a single one for the entire disk, not one per directory either), and that that tree is sorted by (CNID, filename) pairs.  And since it’s case-preserving *and* case-insensitive, the comparisons it does to order its B+-Tree nodes *cannot* be raw.  I should know - I’ve actually written the code for it!

Even for legacy HFS, which didn’t store UTF-16, but stored a specified Mac legacy encoding (the encoding used is in the volume header), it’s case sensitive, so the encoding matters.

I don’t know what tricks Tenon MachTen pulled on Mac OS 9, but I *do* know how the filesystem works.

Kind regards,



More information about the Unicode mailing list