Why Nothing Ever Goes Away

Sean Leonard lists+unicode at seantek.com
Tue Oct 6 07:24:06 CDT 2015


> 2. The Unicode code charts are (deliberately) vague about U+0080, U+0081,
> and U+0099. All other C1 control codes have aliases to the ISO 6429
> set of control functions, but in ISO 6429, those three control codes 
> don't
> have any assigned functions (or names).

On 10/5/2015 3:57 PM, Philippe Verdy wrote:
> Also the aliases for C1 controls were formally registered in 1983 only 
> for the two ranges U+0084..U+0097 and U+009B..U+009F for ISO 6429.

If I may, I would appreciate another history lesson:
In ISO 2022 / 6429 land, it is apparent that the C1 controls are mainly 
aliases for ESC 4/0 - 5/15. ( @ through _ ) This might vary depending on 
what is loaded into the C1 register, but overall, it just seems like 
saving one byte.

Why was C1 invented in the first place?

And, why did Unicode deem it necessary to replicate the C1 block at 
0x80-0x9F, when all of the control characters (codes) were equally 
reachable via ESC 4/0 - 5/15? I understand why it is desirable to align 
U+0000 - U+007F with ASCII, and maybe even U+0000 - U+00FF with Latin-1 
(ISO-8859-1). But maybe Windows-1252, MacRoman, and all the other 
non-ISO-standardized 8-bit encodings got this much right: duplicating 
control codes is basically a waste of very precious character code real 
estate.

Sean

PS I was not able to turn up ISO 6429:1983, but I did find ECMA-48, 4th 
Ed., December 1986, which has the following text:
***
5.4 Elements of the C1 Set
These control functions are represented:
- In a 7-bit code by 2-character escape sequences of the form ESC Fe, 
where ESC is represented by bit combination 01/11 and Fe is represented 
by a bit combination from 04/00 to 05/15.
- In an 8-bit code by bit combinations from 08/00 to 09/15.
***

This text is seemingly repeated in many analogous standards ca. ~1974 - 
~1992.

PPS I happen to have a copy of ANSI X3.41-1974 "American National 
Standard Code Extension Techniques for Use with the 7-Bit Coded 
Character Set of [ASCII]". The invention/existence of C1 goes back to 
this time, as does the use of ESC Fe to invoke C1 characters in a 7-bit 
code, and 0x80-0x9F to invoke C1 characters in an 8-bit code. (See, in 
particular, Clauses 5.3.3.1 and 5.3.6). In particular, Clause 7.3.1.2 
says: "The use of ESC Fe sequence in an 8-bit environment is contrary to 
the intention of this standard but, should they occur, their meaning is 
the same as in the 7-bit environment."

I can appreciate why it was desirable to "fold" C1 in an 8-bit 
environment into a 7-bit environment with ESC Fe. (If, in fact, that was 
the direction of standardization: invent a new thing and then devise a 
coding to express the new thing in the old thing.) It is less obvious 
why Unicode adopted C1, however, when the trend was to jettison the 
94-character Tetris block assignments in favor of a wide-open field for 
character assignment. Except for the trend in Unicode to "avoid 
assigning characters when explicitly asked, unless someone implements 
them without asking, and the implementation catches on, and then just 
assign the whole lot of them, even when they overlap with existing 
assignments, and then invent composite characters, which further 
compound the possible overlapping combinations". ��


More information about the Unicode mailing list