Encoding of old compatibility characters

Mon Mar 27 17:05:28 CDT 2017

Another example, about to be encoded, it the GOUP MARK, used on old IBM 
computers (proposal: ML threads: 
http://www.unicode.org/mail-arch/unicode-ml/y2015-m01/0040.html , and 
http://unicode.org/mail-arch/unicode-ml/y2007-m05/0367.html )

Le 27/03/2017 à 23:46, Frédéric Grosshans a écrit :
> An example of a legacy character successfully  encoded recently is ⏨ 
> U+23E8 DECIMAL EXPONENT SYMBOL, encoded in Unicode 5.2.
> It came from the Soviet standard GOST 10859-64 and the German standard 
> ALCOR. And was proposed by Leo Broukhis in this proposal 
> http://www.unicode.org/L2/L2008/08030r-subscript10.pdf . It follows a 
> discussion on this mailing list here 
> http://www.unicode.org/mail-arch/unicode-ml/y2008-m01/0123.html, where 
> Ken Whistler was already sceptical about the usefulness of this encoding.
>
>
> Le 27/03/2017 à 16:44, Charlotte Buff a écrit :
>> I’ve recently developed an interest in old legacy text encodings and 
>> noticed that there are various characters in several sets that don’t 
>> have a Unicode equivalent. I had already started research into these 
>> encodings to eventually prepare a proposal until I realised I should 
>> probably ask on the mailing list first whether it is likely the UTC 
>> will be interested in those characters before I waste my time on a 
>> project that won’t achieve anything in the end.
>>
>> The character sets in question are ATASCII, PETSCII, the ZX80 set, 
>> the Atari ST set, and the TI calculator sets. So far I’ve only 
>> analyzed the ZX80 set in great detail, revealing 32 characters not in 
>> the UCS. Most characters are pseudo-graphics, simple pictographs or 
>> inverted variants of other characters.
>>
>> Now, one of Unicode’s declared goals is to enable round-trip 
>> compatibility with legacy encodings. We’ve accumulated a lot of weird 
>> stuff over the years in the pursuit of this goal. So it would be 
>> natural to assume that the unencoded characters from the mentioned 
>> sets would also be eligible for inclusion in the UCS. On the other 
>> hand, those encodings are for the most part older than Unicode and so 
>> far there seems to have been little interest in them from the UTC or 
>> WG2, or any of their contributors. Something tells me that if these 
>> character sets were important enough to consider for inclusion, they 
>> would have been encoded a long time ago along with all the other 
>> stuff in Block Elements, Box Drawings, Miscellaneous Symbols etc.
>>
>> Obviously the character sets in question don’t receive much use 
>> nowadays (and some weren’t even that relevant in their time, either), 
>> which leads to me wonder whether further putting work into this 
>> proposal would be worth it.
>
>