Concise term for non-ASCII Unicode characters

Daniel Bünzli daniel.buenzli at erratique.ch
Sun Sep 20 14:57:10 CDT 2015


Le dimanche, 20 septembre 2015 à 18:59, Steve Swales a écrit :
> Exactly. I think the reason that non-ASCII feels non-concise is that there is widespread confusion between ASCII and Latin-1/ISO 8859-1 (which in turn is widely confused with Windows-1252).

For this reason I usually use the term US-ASCII, which is the IANA name for the 7-bit-ASCII characters [1].

Someone referring to the non-US-ASCII scalar values of unicode would make precise sense to me. But then maybe Peter's very last suggestion is actually the most precise you can get to.

Also if you are talking about UTF-8 I would use the term scalar values rather than "characters" or "code points" since surrogates can't be encoded in UTF-8.

Best,

Daniel

[1] http://www.iana.org/assignments/character-sets





More information about the Unicode mailing list