Superscript and Subscript Characters in General Use

Marcel Schneider charupdate at orange.fr
Mon Jan 9 15:39:40 CST 2017


Iʼm saddened to have fallen into a monologue. Thus Iʼll quickly debrief 
the arguments opposed so far, to check whether Iʼm missing some point:

• English ordinals with baseline endings are incorrect too, so that they need 
formatting, as do French ones:
➔ English is in the same case as French and a few other languages, that cannot be 
correctly spelt in plain text without superscript ordinal indicators. Here the 
modifier small letters can be a ready-to-use, often well-looking fallback.

• Those modifier letters have poor font support, so that the text is messed up:
➔ Most work fonts do support them. For incomplete (ornamental) fonts, conversion 
tools will replace the modifier letters with formatted current characters.

• The modifier letters donʼt currently match superscripting styles, nor do 
superscript and subscript digits match fraction styling in most fonts:
➔ For high-end processing, the text is converted to legacy presentation. 
Fraction styling is anyway missing in current software, while the superscript 
and subscript formatting doesnʼt match true vulgar fractions neither, though 
nobody seems to care among the implementers.

• Implementers do hard work to provide fraction styling, so that they mustnʼt 
be bothered with alternate characters to support, as super/sub scripts:
➔ This additional support is very easy to implement, as it typically needs no 
more than a set of equivalence classes.

• Character pickers add other problems with font bindings, when people use them 
instead of looking for an appropriate keyboard layout:
➔ If the goal is to correctly spell all natural languages in plain text, the 
character availability is ideally completed with updated keyboard layouts for input.

So far by memory. Going through the archives:

• Plain text is often unable to express stress or other aspects of the information 
that is a part of the content:
➔ The issue is only about correctly spelling all languages in plain text. 
Superscripting of abbreviation endings (and of numerators) belongs to another level 
of correctness than arbitrary stress and other postiche complements.

• Unicode considers superscripting for the representation of natural languages 
as out of scope:
➔ Whenever superscripting is required for the correct spelling and unambiguous 
representation of natural languages, this requirement should be relaxed, as it is 
for a set of technical notations.

• “As long as French is ordinary text, the abbreviations require styled (rich) text.”
➔ No human language can be dismissed to rich text for its orthographically correct 
representation, without infringing the design principles of Unicode.

• Baseline fallbacks are unambiguous for all French abbreviations, at least in 
context:
➔ True. Some other scripts provide much less written information and leave more 
ambiguities. But this is intrinsic, not by lack of character encoding. Wherever 
baseline fallbacks are considered incorrect, or not “pure”, superscripts must be 
provided in plain text, at least as an unambiguous fallback.

• Other means are available to unambiguously represent abbreviations, and they can 
be written out:
➔ Every traditional spelling must be supported in Unicode.

• Some French and Spanish abbreviation endings include accented letters, that arenʼt 
a part of the limited set of modifier letters:
➔ Following the Unicode design principles, a complete base letter alphabet suffices 
since combining diacritics can be added. In practice, these diacritics appear to 
sometimes interact well even with superscript base letters. Where they donʼt yet, 
itʼs a matter of updating the fonts, or alternately of falling back to legacy 
processing (after using a macro, a plugin, or another tool to convert the text 
to legacy representation).

• Adding *MODIFIER LETTER SMALL Q for use as a superscript in natural languages 
would bring up the need to provide the same facilities to all other scripts, for 
equalityʼs sake:
➔ Latin script seems to be the only one that uses superscript in current text. 
If some languages using other scripts still cannot be orthographically spelt in 
plain text, itʼs up to work out the corresponding proposals filling the gaps.

• Superscript abbreviations in natural languages must use generic formatting 
features, so as they are used for “footnotes, mathematical and chemical 
expressions, and the like”:
➔ These three domains are of another level of complexity, and thus cannot be 
compared with ordinary text. On the other hand, the use of formatting for 
orthographic superscripts in ordinary text should be considered a legacy fallback, 
not a standard way of writing natural languages.

• Vulgar fractions made of super- and subscripts are not machine-readable and 
cannot be parsed correctly without any not yet available convention, somewhat 
like arbitrary emoji or ASCII art:
➔ They have a compatibility mapping to ASCII digits, and Unicode has taken care 
to prevent misparsing.

• Using super and sub scripts in abbreviations and fractions is bad practice, a 
sort of random suggestion:
➔ It can be tagged as bad (though not really “random”) practice because TUS does 
not specify it (while not discouraging it neither). To make it good practice, 
referencing it in the standard as an alternate representation would suffice. 
(Cf. above, again:
• Implementers do hard work to provide fraction styling, so that they mustnʼt 
be bothered with alternate characters to support, as super/sub scripts:
➔ This additional support is very easy to implement, as it typically needs no 
more than a set of equivalence classes.)

• Using those modifier letters and super/sub scripts in that contexts is an 
undisciplined hack:
➔ The idea that Unicode characters are only to be used with a specific, 
conventional meaning is considered a misperception of the Standard. Flagging 
character re-use as a hack builds a severe limitation to the principle of 
character polysemics. This is the start-point from where we need to investigate 
on who is enforcing this kind of discipline, who is interested in restricting the 
use of the discussed characters to keep (and even, to draw) people away from 
using them as long as they display well, and establishing new practice-proof usage 
protocols, including gateways to legacy protocols.

Hopefully,

Marcel



More information about the Unicode mailing list