Concise term for non-ASCII Unicode characters

Richard Wordingham richard.wordingham at
Tue Sep 29 16:27:02 CDT 2015

On Tue, 29 Sep 2015 20:27:28 +0100
Daniel Bünzli <daniel.buenzli at> wrote:

> Le mardi, 29 septembre 2015 à 19:50, Ken Whistler a écrit :
> > I agree that "scalar values greater than U+007F" doesn't just trip
> > off the tongue, and while technically accurate, it is bad
> > terminology -- precisely because it begs the question "wtf are
> > 'scalar values'?!" for the average engineer.
> And an average engineer knows how to lookup definitions, that one
> being precise and exceptionally well defined in the Unicode glossary
> — in stark contrast to the shady (and deceiving for the newbie)
> notion of "character" that you use subsequently in your message.

The glossary might fool a 'newbie' (the declared target audience), but
its riddled enough with errors to dispel confidence.  Just looking at
the entries before 'ASCII':

OK: 'Abstract character sequence' (if one has a usable understanding of
'abstract character'); 'accent mark', 'acrophonic', 'akshara' (though
the spelling with neither an 'h' nor a dot below is weird);
'algorithm', 'alphabet' (though saying that modern Lao and pointed
Hebrew use alphabets is probably not very helpful),
'alphabetic' (though it's not obvious to me why ARABIC SUKUN is
alphabetic but potentially visible viramas are not), 'alphabetic
sorting', 'annotation', 'apparatus criticus', 'Arabic Indic
digits' (though are 'European digits' derived from the digits of the
eastern part of the Arab world?)


'Abjad' (living abjads also mark vowels, with some vowels having
characters dignified as 'letters').  Does normal Egyptian hieroglyphic
writing constitute an abjad?

'Abstract character' - but then the definition makes no sense.

'Abugida' - needs 'consonants' and 'vowels' to be qualified by 'most',
otherwise it won't even work for Classical Sanskrit in Devanagari.
Vowel letters and visarga are the principal problems. 

'ANSI' - I don't think the Windows code pages for UTF-8 and UTF-16 are

'Arabic digits' - aren't the European digits used in western Arabic as
native as the eastern Arabic digits (U+0660 etc.) used in eastern

11 more-or-less OK versus 5 dodgy does not generate a great deal of
confidence in the glossary.

I appreciate that the difference between abjad, abugida and alphabet is
difficult to capture, as abjads and abugidas can evolve into


More information about the Unicode mailing list