Encoding of old compatibility characters

Frédéric Grosshans frederic.grosshans at gmail.com
Mon Mar 27 16:46:34 CDT 2017


An example of a legacy character successfully  encoded recently is ⏨ 
U+23E8 DECIMAL EXPONENT SYMBOL, encoded in Unicode 5.2.
It came from the Soviet standard GOST 10859-64 and the German standard 
ALCOR. And was proposed by Leo Broukhis in this proposal 
http://www.unicode.org/L2/L2008/08030r-subscript10.pdf . It follows a 
discussion on this mailing list here 
http://www.unicode.org/mail-arch/unicode-ml/y2008-m01/0123.html, where 
Ken Whistler was already sceptical about the usefulness of this encoding.


Le 27/03/2017 à 16:44, Charlotte Buff a écrit :
> I’ve recently developed an interest in old legacy text encodings and 
> noticed that there are various characters in several sets that don’t 
> have a Unicode equivalent. I had already started research into these 
> encodings to eventually prepare a proposal until I realised I should 
> probably ask on the mailing list first whether it is likely the UTC 
> will be interested in those characters before I waste my time on a 
> project that won’t achieve anything in the end.
>
> The character sets in question are ATASCII, PETSCII, the ZX80 set, the 
> Atari ST set, and the TI calculator sets. So far I’ve only analyzed 
> the ZX80 set in great detail, revealing 32 characters not in the UCS. 
> Most characters are pseudo-graphics, simple pictographs or inverted 
> variants of other characters.
>
> Now, one of Unicode’s declared goals is to enable round-trip 
> compatibility with legacy encodings. We’ve accumulated a lot of weird 
> stuff over the years in the pursuit of this goal. So it would be 
> natural to assume that the unencoded characters from the mentioned 
> sets would also be eligible for inclusion in the UCS. On the other 
> hand, those encodings are for the most part older than Unicode and so 
> far there seems to have been little interest in them from the UTC or 
> WG2, or any of their contributors. Something tells me that if these 
> character sets were important enough to consider for inclusion, they 
> would have been encoded a long time ago along with all the other stuff 
> in Block Elements, Box Drawings, Miscellaneous Symbols etc.
>
> Obviously the character sets in question don’t receive much use 
> nowadays (and some weren’t even that relevant in their time, either), 
> which leads to me wonder whether further putting work into this 
> proposal would be worth it.




More information about the Unicode mailing list