Concise term for non-ASCII Unicode characters

Mark Davis ☕️ mark at macchiato.com
Tue Sep 29 11:33:36 CDT 2015


I think the term "non-ASCII Unicode" is just fine, and we don't need
anything beyond that. It is clearly those Unicode characters that aren't
(2) in http://unicode.org/glossary/#ASCII.


Mark <https://google.com/+MarkDavis>

*— Il meglio è l’inimico del bene —*

On Tue, Sep 29, 2015 at 6:20 PM, Sean Leonard <lists+unicode at seantek.com>
wrote:

> On 9/21/2015 5:17 PM, Peter Constable wrote:
>
>> If you think it's a serious problem that there isn't one conventional
>> term for "characters outside the ASCII repertoire" or "UTF-8
>> multi-code-unit encoded representations" (since different authors could
>> devise different terminology solutions), then I suggest you submit a
>> document to UTC explaining why it's a problem, documenting inconsistent or
>> unclear terminology that's been used in some standards / public
>> specifications, and requesting that Unicode formally define terminology for
>> these concepts. I can't guarantee that UTC will do it, but I can predict
>> with confidence that it _won't_ do anything of that nature if nobody
>> submits such a document. Peter
>>
>
> I am of the mind to do just that, then. I have seen different documents,
> standards, and standards bodies that have invented terminology around this
> term, and they are not always the same. Since these standards depend on
> Unicode, it would make a lot of sense for Unicode formally to define
> terminology for these concepts. With the proliferation of UTF-8 (among
> other things), the boundary between 0x7F - 0x80 is more significant than
> the boundary between 0xFFFF - 0x10000.
>
> Since this will be my first submission I would appreciate a co-author on
> this topic. Is anyone willing to help? Thanks in advance. Also, it is not
> clear if such a document is destined to become a Unicode Technical Report
> (UTR / PDUTR etc.), or if it should just be an informal write-up. I am
> guessing this is supposed to be somewhat informal but at the same time it
> (or the results of it) ought to appear in the UTC Document Search.
>
> The current terminology that I am considering pursuing is "beyond ASCII",
> in various permutations, such as "beyond the ASCII range", "characters
> beyond ASCII", "code points beyond ASCII", etc. The term "beyond" implies a
> certain directionality, and to that extent, implies the Unicode repertoire
> as well as a Unicode encoding. We have seen on this list the blackflips
> required to clarify "non-ASCII", since things that are not ASCII literally
> could be a wide range of things.
>
> I think there is some confusion about whether the term "Basic Latin"
> excludes the C0 control character range. Formally the standard seems clear
> enough to me that it is co-terminus with ASCII, but there is still
> confusion if you don't pore through the Standard. My thought is that maybe
> the Blocks.txt data should be modified to say "ASCII (Basic Latin)" instead
> of just "Basic Latin". (If we "go there", I would appreciate the wisdom of
> an experienced Unicode co-author. I am not confident touching that just by
> myself.)
>
> Sean
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20150929/9f6f5928/attachment.html>


More information about the Unicode mailing list