Concise term for non-ASCII Unicode characters

Martin J. Dürst duerst at
Sun Sep 20 19:51:32 CDT 2015

Hello Sean,

On 2015/09/20 23:48, Sean Leonard wrote:
> What is the most concise term for characters or code points

So we already have two different things we might need a term for.

> outside of
> the US-ASCII range (U+0000 - U+007F)? Sometimes I have referred to these
> as "extended characters"

Most of the characters outside the US-ASCII range are perfectly simple 
and basic characters. I don't think the term 'extended' fits well here. 
It gives the impression that everything except US-ASCII is somewhat 
extraordinary, which in this day and age shouldn't be the case anymore.

> or "non-ASCII Unicode" but I do not find those
> terms precise. We are talking about the code points U+0080 - U+10FFFF. I
> suppose that this also refers to code points/scalar values that are not
> formally Unicode characters, such as U+FFFF.

Again we may need different terms depending on whether these are 
included or not.

> Basically, I am looking for
> a concise term for values that would require multiple UTF-8 octets if
> encoded in UTF-8 (without referring to UTF-8 encoding specifically).
> "Non-ASCII" is not precise enough since character sets like Shift-JIS
> are non-ASCII.

Well, the non-ASCII characters in Shift-JIS are also contained in 
Unicode, so depending on exactly what you want to talk about, Non-ASCII 
characters may be good enough.

> Also a citation to a relevant standard (whether Unicode or otherwise)
> would be helpful.
> The terms "supplementary character" and "supplementary code point" are
> defined in the Unicode standard, referring to characters or code points
> above U+FFFF. I am looking for something like those, but for characters
> or code points above U+007F.

And then in some cases, you may want to exclude the C0 area 
(U+0000-001F), or part of it, or some syntactically significant 
characters (e.g. punctuation) in the remaining part.

Anyway, what I wanted to show is that depending on what you need it for, 
there are so many different variations that it doesn't pay off to create 
specific short terms for all of them, and the term you use currently may 
be short enough.

Regards,   Martin.

More information about the Unicode mailing list