Why Nothing Ever Goes Away
verdy_p at wanadoo.fr
Tue Oct 6 08:57:37 CDT 2015
2015-10-06 14:24 GMT+02:00 Sean Leonard <lists+unicode at seantek.com>:
> 2. The Unicode code charts are (deliberately) vague about U+0080, U+0081,
>> and U+0099. All other C1 control codes have aliases to the ISO 6429
>> set of control functions, but in ISO 6429, those three control codes don't
>> have any assigned functions (or names).
> On 10/5/2015 3:57 PM, Philippe Verdy wrote:
>> Also the aliases for C1 controls were formally registered in 1983 only
>> for the two ranges U+0084..U+0097 and U+009B..U+009F for ISO 6429.
> If I may, I would appreciate another history lesson:
> In ISO 2022 / 6429 land, it is apparent that the C1 controls are mainly
> aliases for ESC 4/0 - 5/15. ( @ through _ ) This might vary depending on
> what is loaded into the C1 register, but overall, it just seems like saving
> one byte.
> Why was C1 invented in the first place?
Look for the history of EBCDIC and its adaptation/conversion with ASCII
compatible encodings: round trip conversion wasneeded (using a only a
simple reordering of byte values, with no duplicates). EBCDIC has used many
controls that were not part of C0 and were kept in the C1 set. Ignore the
7-bit compatiblity encoding using pairs, they were only needed for ISO
2022, but ISO 6429 defines a profile where those longer sequences are not
needed and even forbidden in 8-bit contexts or in contexts where aliases
are undesirable and invalidated, such as security environments.
With your thoughts, I would conclude that assigning characters in the G1
set was also a duplicate, because it is reachable with a C0 "shifting"
control + a position of the G0 set. In that case ISO 8859-1 or Windows 1252
was also an unneeded duplication ! And we would live today in a 7-bit only
C1 controls have their own identity. The 7-bit encoding using ESC is just a
hack to make them fit in 7-bit and it only works where the ESC control is
assumed to play this function according to ISO 2022, ISO 6429, or other
similar old 7-bit protocols such as Videotext (which was widely used in
France with the free "Minitel" terminal, long before the introduction of
the Internet to the general public around 1992-1995).
Today Videotext is definitely dead (the old call numbers for this slow
service are now definitely defunct, the Minitels are recycled wastes, they
stopped being distributed and replaced by applications on PC connected to
the Internet, but now all the old services are directly on the internet and
none of them use 7-bit encodings for their HTML pages, or their mobile
applications). France has also definitely abandoned its old French version
of ISO 646, there are no longer any printer supporting versions of ISO 646
other than ASCII, but they still support various 8-bit encodings.
7-bit encodings are things of the past (they were only justified at times
where communication links were slow and generated lots of transmission
errors, and the only implemented mecanism to check them was to use a single
parity bit per character. Today we transmit long datagrams and prefer using
checks codes for the whole (such as CRC, or autocorrecting codes). 8-bit
encodings are much easier and faster to process for transmitting not just
text but also binary data.
Let's forget the 7-bit world definitely. We have also abandonned the old
UTF-7 in Unicode ! I've not seen it used anywhere except in a few old
emails sent at end of the 90's, because many mail servers were still not
8-bit clean and silently transformed non-ASCII bytes in unpredictable ways
or using unspecified encodings, or just siltently dropped the high bit,
assuming it was just a parity bit : at that time, emails were not sent with
SMTP, but with the old UUCP protocol and could take weeks to be delivered
to the final recipient, as there was still no global routing infrastructure
and many hops were necessary via non-permanent modem links. My opinion of
UTF-7 is that it was just a temporary and experimental solution to help
system admins and developers adopt the new UCS, including for their old
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Unicode