Concise term for non-ASCII Unicode characters

Sean Leonard lists+unicode at seantek.com
Tue Sep 29 22:40:48 CDT 2015


On 9/29/2015 12:27 PM, Daniel Bünzli wrote:
> Le mardi, 29 septembre 2015 à 19:50, Ken Whistler a écrit :
>> I agree that "scalar values greater than U+007F" doesn't just trip off the tongue,
>> and while technically accurate, it is bad terminology -- precisely because it
>> begs the question "wtf are 'scalar values'?!" for the average engineer.
> And an average engineer knows how to lookup definitions, that one being precise and exceptionally well defined in the Unicode glossary — in stark contrast to the shady (and deceiving for the newbie) notion of "character" that you use subsequently in your message.
>
> This is not "bad terminology", it's *precise* terminology and what I would like to see used in protocols and standards.
>
> Many programmers I talk to are confused by Unicode because their notion of Unicode "character" is a chaotic mix of scalar values, code points and their various *encodings* (i.e. byte level considerations).

+1

I like the definition of "character" in ASCII:
3.3 Character. A member of a set of elements used for the organization, 
control, or representation of data.

This, by the way, is the exact same definition as in ISO 646, ISO 2022, 
and yes, even ISO 10646 (2003). It was the best of times...

Sean


More information about the Unicode mailing list