Concise term for non-ASCII Unicode characters

Philippe Verdy verdy_p at wanadoo.fr
Tue Sep 22 03:45:28 CDT 2015


I would not use the "clumsy 7-bit ASCII" due to the confusion created since
long when it could refer to any national version of ISO 646, which reassign
some code positions in the rande 0x00 to 0x07F to other characters outside
the range U+0000 to U+007F, while still remaining 7-bit encodings.
So insead of "7-bit ASCII" I highly prefer the term "US-ASCII" to make sure
it refers to the encoding of 7-bit code positions effectively to
U+0000..U+007F.

So for code positions outside 0x00..0x7F, I would call them "not US-ASCII"
(none of them are bound to any Unicode "character" or "code point" or
"scalar value", they are just "code positions" or more precisely "octet
values with their most significant bit set to 1" which is really long: "not
US-ASCII" is fine as a shorter term).

2015-09-22 9:43 GMT+02:00 Richard Wordingham <
richard.wordingham at ntlworld.com>:

> On Sun, 20 Sep 2015 16:52:29 +0000
> Peter Constable <petercon at microsoft.com> wrote:
>
> > You already have been using "non-ASCII Unicode", which is about as
> > concise and sufficiently accurate as you'll get. There's no term
> > specifically defined in any standard or conventionally used for this.
>
> As to standards, UTS#18 'Unicode Regular Expression' Requirement
> RL1.2 requires the support of the 'property' it calls 'ASCII', which is
> defined in Section 1.2.1 as the property of being in the range U+0000 to
> U+007F. This implicitly makes 'not ASCII' a derived property held by all
> the other codepoints. If you fear that your audience will think that
> Latin-1 characters are ASCII, you'll just have to go for the clumsy
> 'not 7-bit ASCII'  and accept that there isn't an unambiguous way in
> English of turning that into an adjective or noun.
>
> If a term were invented, you'd generally have to explain it, and you
> would do better just to remind readers what ASCII is.
>
> Richard.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20150922/6fa8d5c3/attachment.html>


More information about the Unicode mailing list