Superscript and Subscript Characters in General Use

Frédéric Grosshans frederic.grosshans at gmail.com
Tue Jan 10 08:40:39 CST 2017


Le 10/01/2017 à 12:03, Alastair Houghton a écrit :
> That’s part of it, but I think also that the thread is increasingly verbose and hard to follow.
>
> I still think that the idea of adding U+???? SUPERSCRIPT and U+???? SUBSCRIPT might be worth contemplating; it would seem to provide a good answer to both Marcel’s and the French standards body’s concerns (wrt their proposed new ordinal indicator) while only using up two code points, and it’d be much easier to explain to people that superscripts and subscripts were a presentational matter and that they needed to talk to their font supplier.  Plus it would work with existing platform rendering engines provided a font with an appropriate OpenType GSUB table was available.
>
> Does anyone besides Marcel have any input on that idea?  Is it worth writing a proposal to add SUPERSCRIPT and SUBSCRIPT?

No! Long story short: encoding the {super,sub}script characters one by 
one in unicode is a choice that was made more than two decades ago, and 
it is much too late to change this, even if it were a good idea.

One of the major problems of such a proposition is that it would be 
incompatible (or ambiguous) with earlier version of unicode, since the 
same character, let’s say “³”, could be encoded in two differrent 
manners : SUPERSCRIPT + U+0033 DIGIT THREE vs the current U+00B3 
SUPESCRIPT THREE, and such things are a big no-no. It was problematic 
with accented characters and led to the definition of NFC / NFD 
normalization with strict stability policies enforced since the 1990s.

If you would manage to convince the Unicode comity that such an encoding 
would fit the plain-text model (good luck with that), without removing 
all the previously encoded superscript/modifier letters (it’s 
forbidden), you would need to define what happens in the various 
normalization models NFC / NFD, and probably a introduce new one (NFE ? 
E for exponent), which would be to say the least, a huge architectural 
change of the Unicode model, for a modest gain if any.





More information about the Unicode mailing list