Feedback on the proposal to change U+FFFD generation when decoding ill-formed UTF-8

Tue May 16 11:52:03 CDT 2017

> On 16 May 2017, at 18:38, Alastair Houghton <alastair at alastairs-place.net> wrote:
> 
> On 16 May 2017, at 17:23, Hans Åberg <haberg-1 at telia.com> wrote:
>> 
>> HFS implements case insensitivity in a layer above the filesystem raw functions. So it is perfectly possible to have files that differ by case only in the same directory by using low level function calls. The Tenon MachTen did that on Mac OS 9 already.
> 
> You keep insisting on this, but it’s not true; I’m a disk utility developer, and I can tell you for a fact that HFS+ uses a B+-Tree to hold its directory data (a single one for the entire disk, not one per directory either), and that that tree is sorted by (CNID, filename) pairs.  And since it’s case-preserving *and* case-insensitive, the comparisons it does to order its B+-Tree nodes *cannot* be raw.  I should know - I’ve actually written the code for it!
> 
> Even for legacy HFS, which didn’t store UTF-16, but stored a specified Mac legacy encoding (the encoding used is in the volume header), it’s case sensitive, so the encoding matters.
> 
> I don’t know what tricks Tenon MachTen pulled on Mac OS 9, but I *do* know how the filesystem works.

One could make files that differed by case in the same directory, and Mac OS 9 did not bother. Legacy HFS tended to slow down with many files in the same directory, so that gave an impression of a tree structure. The BSD filesystem at the time, perhaps the one that Mac OS X once supported, did not store files in a tree, but flat with redundancy.  The other info I got on the Austin Group List a decade ago.