Why Work at Encoding Level?
daniel.buenzli at erratique.ch
Wed Oct 21 08:16:07 CDT 2015
Le mercredi, 21 octobre 2015 à 04:37, Mark Davis ☕️ a écrit :
> If you're not, the question is relevant.
I'm not disputing the question, I'm disputing trying to give it a defined answer. Even if your string is UTF-16 based these problems can be solved by providing proper abstractions at the library level and ask clients to handle the problem *once* when you inject the UTF-16 strings in your abstraction which can then operate in a "clean" world where these questions do not arise.
Besides programming languages do evolve and one should at least make sure that new languages provide adequate abstractions for handling Unicode text. Looking at the recent batch of new languages I don't think this is happening. I'm sure language designers are keen on taking off-the shelf designs for this rather than get into the details and but I would say that TUS by defining notions of Unicode strings at the encoding level is not doing a very good job at providing one.
FWIW when I got into the standard around 2008 by reading that thick hard-copy of TUS 5.0, I took me quite some time to actually understand and uncover the real structure behind Unicode which are the scalar values.
More information about the Unicode