Concise term for non-ASCII Unicode characters
Daniel Bünzli
daniel.buenzli at erratique.ch
Sun Sep 20 14:57:10 CDT 2015
Le dimanche, 20 septembre 2015 à 18:59, Steve Swales a écrit :
> Exactly. I think the reason that non-ASCII feels non-concise is that there is widespread confusion between ASCII and Latin-1/ISO 8859-1 (which in turn is widely confused with Windows-1252).
For this reason I usually use the term US-ASCII, which is the IANA name for the 7-bit-ASCII characters [1].
Someone referring to the non-US-ASCII scalar values of unicode would make precise sense to me. But then maybe Peter's very last suggestion is actually the most precise you can get to.
Also if you are talking about UTF-8 I would use the term scalar values rather than "characters" or "code points" since surrogates can't be encoded in UTF-8.
Best,
Daniel
[1] http://www.iana.org/assignments/character-sets
More information about the Unicode
mailing list