Unicode String Models

Daniel Bünzli via Unicode unicode at unicode.org
Sun Sep 9 08:42:19 CDT 2018


Hello, 

I find your notion of "model" and presentation a bit confusing since it conflates what I would call the internal representation and the API. 

The internal representation defines how the Unicode text is stored and should not really matter to the end user of the string data structure. The API defines how the Unicode text is accessed, expressed by what is the result of an indexing operation on the string. The latter is really what matters for the end-user and what I would call the "model".

I think the presentation would benefit from making a clear distinction between the internal representation and the API; you could then easily summarize them in a table which would make a nice summary of the design space.

I also think you are missing one API which is the one with ECG I would favour: indexing returns Unicode scalar values, internally be it whatever you wish UTF-{8,16,32} or a custom encoding. Maybe that's what you intended by the "Code Point Model: Internal 8/16/32" but that's not what it says, the distinction between code point and scalar value is an important one and I think it would be good to insist on it to clarify the minds in such documents.

Best, 

Daniel





More information about the Unicode mailing list