Concise term for non-ASCII Unicode characters

Steve Swales steve at swales.us
Sun Sep 20 12:59:52 CDT 2015


Exactly. I think the reason that non-ASCII feels non-concise is that there is widespread confusion between ASCII and Latin-1/ISO 8859-1 (which in turn is widely confused with Windows-1252).

-steve  




Sent from my iPhone


> On Sep 20, 2015, at 10:05 AM, Phillips, Addison <addison at lab126.com> wrote:
> 
> I agree, although I note that sometimes the additional (redundant) specificity of "non-7-bit-ASCII characters" is needed when talking to people unclear on what "ASCII" means.
> 
> Addison
> 
>> -----Original Message-----
>> From: Unicode [mailto:unicode-bounces at unicode.org] On Behalf Of Peter
>> Constable
>> Sent: Sunday, September 20, 2015 9:52 AM
>> To: Sean Leonard; unicode at unicode.org
>> Subject: RE: Concise term for non-ASCII Unicode characters
>> 
>> You already have been using "non-ASCII Unicode", which is about as concise
>> and sufficiently accurate as you'll get. There's no term specifically defined in
>> any standard or conventionally used for this.
>> 
>> 
>> Peter
>> 
>> -----Original Message-----
>> From: Unicode [mailto:unicode-bounces at unicode.org] On Behalf Of Sean
>> Leonard
>> Sent: Sunday, September 20, 2015 7:48 AM
>> To: unicode at unicode.org
>> Subject: Concise term for non-ASCII Unicode characters
>> 
>> What is the most concise term for characters or code points outside of the
>> US-ASCII range (U+0000 - U+007F)? Sometimes I have referred to these as
>> "extended characters" or "non-ASCII Unicode" but I do not find those terms
>> precise. We are talking about the code points U+0080 - U+10FFFF. I suppose
>> that this also refers to code points/scalar values that are not formally
>> Unicode characters, such as U+FFFF. Basically, I am looking for a concise term
>> for values that would require multiple UTF-8 octets if encoded in UTF-8
>> (without referring to UTF-8 encoding specifically).
>> "Non-ASCII" is not precise enough since character sets like Shift-JIS are non-
>> ASCII.
>> 
>> Also a citation to a relevant standard (whether Unicode or otherwise) would
>> be helpful.
>> 
>> The terms "supplementary character" and "supplementary code point" are
>> defined in the Unicode standard, referring to characters or code points
>> above U+FFFF. I am looking for something like those, but for characters or
>> code points above U+007F.
>> 
>> Thank you,
>> 
>> Sean
> 
> 



More information about the Unicode mailing list