Superscript and Subscript Characters in General Use

Marcel Schneider charupdate at orange.fr
Wed Jan 4 15:48:29 CST 2017


On Wed, 4 Jan 2017 15:13:36 +0000, Alastair Houghton wrote:
> 
> > Given that the WG of the French standard keyboard is actually interested in getting
> > encoded a new ordinal indicator (kind of 'ᵉ'), I feel the more urged to stay tuned,
> > and to comment on subsequent e-mails, too.
> 
> I can understand the desire to encode the new ordinal indicator.
> 
> Perhaps another option worth contemplating might be to standardise some control 
> code points, to provide a mechanism for “plain text” to include the necessary 
> minimum of formatting information without additional markup. The advantage of 
> this approach is that it would make it explicitly obvious that Unicode wasn’t 
> going to include further super or subscript forms, while providing everyone that 
> wants them with access to a full set of super or subscripts subject to system 
> (or font) support.
> 
> A simple form of this might be to encode the new zero-width modifier code points 
> SUBSCRIPT and SUPERSCRIPT that work somewhat like the variation selectors, so e.g.
> 
> U+0032 DIGIT TWO
> U+???? SUPERSCRIPT
> U+0033 DIGIT THREE
> U+???? SUBSCRIPT
> 
> would display as ²₃ on fonts that supported the new modifiers. The advantage of 
> taking this very simplistic approach is that it can be dealt with in the OpenType 
> (or AAT) tables in modern fonts, rather than necessitating changes to rendering 
> code. It is also obviously not an attempt to replace markup, but will cope with 
> most common “plain text” uses. 

This would indeed make for stable plain text representations that convey the 
necessary vertical alignment. However its encoding would imply that the design 
principle of “not attempt[ing] to describe the positioning of a character 
above or below the baseline in typographical layout” is superseded in this 
particular case, that provides a universal mechanism for a basic formatting 
parameter. Consistently this would call for some extensions catering for other 
formatting parameters. The expense in code points would be very low, the scheme 
would meet user expectations, and the Standard would become even more performative 
and thus, even more attractive through its enhancing the plain text environment. 
Eventually, the display of text editors, that actually is internally directed 
(for syntactic highlighting), would become text-guided. This is not far from 
rich-text.

It all tends to the conclusion that the French demand is based upon: 
modifier letters that are superscript forms, are not real superscripts, they 
don’t fit the expectations of people regarding superscripts and abbreviations. 
I already expressed my point of view in this discussion. But the real concern 
could be to emulate the Spanish ordinal indicators, arguing that their being 
a part of Unicode justifies similar facilities for other languages. Here the 
Unicode position is that the Spanish ordinal indicators are backcompat code 
points for roundtrip compatibility with ISO/IEC 8859-1. This clearly results 
from the Code Charts at U+00AA, U+00BA. There has been a deadline, that 
diligence made to precede. Let alone that a complete set of ordinal indicators 
for French necessitates four letters, that is probably exceeding the framework 
of 8-bit charsets common to several countries. 

As far as the discussion grew until now, I feel that French must live with 
the existing infrastructure. Hence the idea of re-using four modifier letters 
for that purpose.

If Iʼm wrong with this idea, that could be good or bad news. Good news if the 
generic SUPERSCRIPT and SUBSCRIPT variant selectors (or alternatively, new 
ordinal indicators) will be effectively encoded. Bad news if that as well as 
the re-use of modifier letters will be discarded. In-between, I see the out-of-
the-box modifier letter solution, as a kind of second-best choice. Better than 
nothing at all. In certain circumstances, better than markup and formatting.

Kind regards,

Marcel



More information about the Unicode mailing list