Superscript and Subscript Characters in General Use (was: Re: a character for an unknown character)

Marcel Schneider charupdate at
Tue Jan 3 18:24:52 CST 2017

On Tue, 3 Jan 2017 09:31:42 +0100, Christoph Päper wrote:

> > Among the possibilities, you include Unicode subscripts. 
> Just for the sake of completeness. 

This tends to conclude that preformatted subscripts are really an option here. 
The TUS snippets [1][2] and common practice show that whatever characters are 
on the keyboard, are used or re-used for superscripts, such as the degree sign 
as superscript o, and the feminine ordinal indicator as superscript a. Layouts 
are baffling inconsistent across countries; so the Belgian AZERTY layout has 
superscript three where its French (France) counterpart has an empty shift state, 
while SUPERSCRIPT ONE is missing on both, despite of the AltGr shift state being 
partially used, and all three being a part of Latin-1. Thus, the consciousness 
of the usefulness of a given character has not always a tight relation to its 
presence on the keyboard.

In the Unicode era, this may tend to expand to the insight that the availability 
of an almost complete range of superscripts, and a set of subscripts, including 
Latin letters, calls the need to add them on national keyboard layouts to cater 
for the demand of increasingly important user groups and communities. Supporting 
this does eventually not require the Unicode Standard to be reworded, because 
TUS mainly reflects encoding principles and usage recommendations, without being 
a typography manual. 

TUS 9.0, §22.4, p. 786, explains that the recommendation not to use preformatted 
characters outside phonetics is a mere application of a design principle, 
regardless of the practical usefulness of the scheme. I note that in the snippet
quoted below, the digit “‘DC0016’” is already messed up by copy-pasting it to 
plain text. By contrast, copying it from Adobe Reader to Microsoft Word brings 
the font size difference with it, but not the vertical alignment, presumably 
because the original specifies a custom subscript style that has no generic 
subscripting information and is not cross-platform compatible. This example 
highlights a serious downside of the markup-based representation scheme.

As demonstrated with the apostrophe, a recommendation may be changed according 
to common practice, and reconsidered in the light of differently weighed rules 
and principles, in favor of what Asmus Freytag pointed on December 28ᵗʰ, 2016, 
in reply to Richard Wordingham:

> > > > Ideal solutions can also be defeated by limited keyboard layouts. As a 
> > > > result, I have no idea whether the singular of "fithp" (one of Larry 
> > > > Niven's alien species) should be spelt with U+02BC or U+2019, though in 
> > > > ASCII I can just write "fi'". 
> > > 
> > > The only place where "uni" doesn't apply in Unicode is that there's never 
> > > just a single principle that applies, but always multiple ones that are 
> > > in tension --- and in the edge cases, the tension can be felt keenly. 
> > > 

As seen in another example in a 2015 thread on plain text custom fractions, 
the English Microsoft Community website is hosting recommendations on how to 
insert fractions made of superscripts, subscripts and the fraction slash U+2044, 
using a list of autocorrections in Word. To test, Iʼve added to the autocorrect 
list four items converting '.s.' to 'ˢᵗ', '.n.' to 'ⁿᵈ', '.r.' to 'ʳᵈ', '.t.' to 'ᵗʰ'. 
The result looks fine in Cambria, bad in uncomplete fonts mixed with a 
fallback font, while Arial has the superscript 'n' in a non-standard way, 
as a legacy remainder, despite of TUS specifying that all those characters 
should be harmonized. 

Itʼs up to the user to choose the best fitting option depending on usage 
and environment. As already discussed, formatting is a working solution 
at the condition that plain text will never be a requirement.

I hope that this lengthy contribution may help to straighten the way for 
the users to feel free to use superscript and subscript characters the way 
they prefer.


[1] TUS 9.0, §22.4, p. 786:
| In general, the Unicode Standard does not attempt to describe the positioning 
| of a character above or below the baseline in typographical layout. 
| Therefore, the preferred means to encode superscripted letters or digits, 
| such as “1st” or “DC0016”, is by style or markup in rich text. […]
| In addition, superscript digits are used to indicate tone in transliteration 
| of many languages. The use of superscript two and superscript three is common 
| legacy practice when referring to units of area and volume in general texts.

[2] TUS 9.0, §7.8, p. 327:
| The superscript forms of the i and n letters can be found in the
| Superscripts and Subscripts block (U+2070..U+209F). The fact that the latter 
| two letters contain the word “superscript” in their names instead of “modifier 
| letter” is an historical artifact of original sources for the characters, and 
| is not intended to convey a functional distinction in the use of these 
| characters in the Unicode Standard.
| Superscript modifier letters are intended for cases where the letters carry 
| a specific meaning, as in phonetic transcription systems, and are not 
| a substitute for generic styling mechanisms for superscripting of text, 
| as for footnotes, mathematical and chemical expressions, and the like.

More information about the Unicode mailing list