Concise term for non-ASCII Unicode characters

Sean Leonard lists+unicode at seantek.com
Tue Sep 22 04:27:36 CDT 2015


On 9/21/2015 9:24 PM, Janusz S. Bien wrote:
> Quote/Cytat - Sean Leonard <lists+unicode at seantek.com> (Mon 21 Sep 
> 2015 10:51:42 PM CEST):
>
>> Related question as I am researching this:
>>
>> How can I acquire (cheaply or free) the latest and most official copy 
>> of US-ASCII, namely, the version that Unicode references?
>
> [...]

Thanks to all. I was able to locate a copy of ANSI X3.4-1986 (R1997) 
[hereinafter ASCII]. (See my subsequent e-mail about the term "ASCII".)

>
> I've never seen the ASCII standard, but I think is it (almost?) 
> identical to ISO/IEC 646, which in turn  is identical to the freely 
> available ECMA-6:
>
> http://www.ecma-international.org/publications/standards/Ecma-006.htm

Having just read both standards documents in some detail, I can attest 
that they are not the same. However, the practical effect for purposes 
of Unicode is the same.

ECMA-6 (1991) is indeed identical to ISO/IEC 646 (as far as I can tell; 
hereinafter ECMA-6). ECMA-6 "specifies a 7-bit coded character set with 
a number of options" (Clause 1.2). Specifically, the following positions 
are ambiguous or subject to national assignment:
2/3 NUMBER SIGN or POUND SIGN
2/4 DOLLAR SIGN or CURRENCY SIGN
4/0
5/11
5/12
5/13
5/14
6/0
7/11
7/12
7/13
7/14

ECMA-6 specifies an International Reference Version (IRV), which 
exercises the "options". The IRV fills in the graphic characters 
consistent with ASCII. However, ECMA-6 sort of leaves the C0 region 
blank...and the IRV (in Annex A, normative) says "if the C0 set [...] is 
used, it shall be the C0 set of Standard ECMA-48." Sort of fudging. 
Anyway, the IRV C0 set / ECMA-48 set is the same as ASCII.

Overall, the takeaway is that specifying ISO/IEC 646 / ECMA-6 is not 
sufficient; you need to include "IRV" as well, or ISO IR No. 6 for the 
G0 set and ISO IR No. 6 for the C0 set.

In contrast, if you say ASCII (ANSI X3.4-1986), all positions are fully 
defined.

Regards,

Sean


More information about the Unicode mailing list