Superscript and Subscript Characters in General Use
Marcel Schneider
charupdate at orange.fr
Mon Jan 9 15:39:40 CST 2017
Iʼm saddened to have fallen into a monologue. Thus Iʼll quickly debrief
the arguments opposed so far, to check whether Iʼm missing some point:
• English ordinals with baseline endings are incorrect too, so that they need
formatting, as do French ones:
➔ English is in the same case as French and a few other languages, that cannot be
correctly spelt in plain text without superscript ordinal indicators. Here the
modifier small letters can be a ready-to-use, often well-looking fallback.
• Those modifier letters have poor font support, so that the text is messed up:
➔ Most work fonts do support them. For incomplete (ornamental) fonts, conversion
tools will replace the modifier letters with formatted current characters.
• The modifier letters donʼt currently match superscripting styles, nor do
superscript and subscript digits match fraction styling in most fonts:
➔ For high-end processing, the text is converted to legacy presentation.
Fraction styling is anyway missing in current software, while the superscript
and subscript formatting doesnʼt match true vulgar fractions neither, though
nobody seems to care among the implementers.
• Implementers do hard work to provide fraction styling, so that they mustnʼt
be bothered with alternate characters to support, as super/sub scripts:
➔ This additional support is very easy to implement, as it typically needs no
more than a set of equivalence classes.
• Character pickers add other problems with font bindings, when people use them
instead of looking for an appropriate keyboard layout:
➔ If the goal is to correctly spell all natural languages in plain text, the
character availability is ideally completed with updated keyboard layouts for input.
So far by memory. Going through the archives:
• Plain text is often unable to express stress or other aspects of the information
that is a part of the content:
➔ The issue is only about correctly spelling all languages in plain text.
Superscripting of abbreviation endings (and of numerators) belongs to another level
of correctness than arbitrary stress and other postiche complements.
• Unicode considers superscripting for the representation of natural languages
as out of scope:
➔ Whenever superscripting is required for the correct spelling and unambiguous
representation of natural languages, this requirement should be relaxed, as it is
for a set of technical notations.
• “As long as French is ordinary text, the abbreviations require styled (rich) text.”
➔ No human language can be dismissed to rich text for its orthographically correct
representation, without infringing the design principles of Unicode.
• Baseline fallbacks are unambiguous for all French abbreviations, at least in
context:
➔ True. Some other scripts provide much less written information and leave more
ambiguities. But this is intrinsic, not by lack of character encoding. Wherever
baseline fallbacks are considered incorrect, or not “pure”, superscripts must be
provided in plain text, at least as an unambiguous fallback.
• Other means are available to unambiguously represent abbreviations, and they can
be written out:
➔ Every traditional spelling must be supported in Unicode.
• Some French and Spanish abbreviation endings include accented letters, that arenʼt
a part of the limited set of modifier letters:
➔ Following the Unicode design principles, a complete base letter alphabet suffices
since combining diacritics can be added. In practice, these diacritics appear to
sometimes interact well even with superscript base letters. Where they donʼt yet,
itʼs a matter of updating the fonts, or alternately of falling back to legacy
processing (after using a macro, a plugin, or another tool to convert the text
to legacy representation).
• Adding *MODIFIER LETTER SMALL Q for use as a superscript in natural languages
would bring up the need to provide the same facilities to all other scripts, for
equalityʼs sake:
➔ Latin script seems to be the only one that uses superscript in current text.
If some languages using other scripts still cannot be orthographically spelt in
plain text, itʼs up to work out the corresponding proposals filling the gaps.
• Superscript abbreviations in natural languages must use generic formatting
features, so as they are used for “footnotes, mathematical and chemical
expressions, and the like”:
➔ These three domains are of another level of complexity, and thus cannot be
compared with ordinary text. On the other hand, the use of formatting for
orthographic superscripts in ordinary text should be considered a legacy fallback,
not a standard way of writing natural languages.
• Vulgar fractions made of super- and subscripts are not machine-readable and
cannot be parsed correctly without any not yet available convention, somewhat
like arbitrary emoji or ASCII art:
➔ They have a compatibility mapping to ASCII digits, and Unicode has taken care
to prevent misparsing.
• Using super and sub scripts in abbreviations and fractions is bad practice, a
sort of random suggestion:
➔ It can be tagged as bad (though not really “random”) practice because TUS does
not specify it (while not discouraging it neither). To make it good practice,
referencing it in the standard as an alternate representation would suffice.
(Cf. above, again:
• Implementers do hard work to provide fraction styling, so that they mustnʼt
be bothered with alternate characters to support, as super/sub scripts:
➔ This additional support is very easy to implement, as it typically needs no
more than a set of equivalence classes.)
• Using those modifier letters and super/sub scripts in that contexts is an
undisciplined hack:
➔ The idea that Unicode characters are only to be used with a specific,
conventional meaning is considered a misperception of the Standard. Flagging
character re-use as a hack builds a severe limitation to the principle of
character polysemics. This is the start-point from where we need to investigate
on who is enforcing this kind of discipline, who is interested in restricting the
use of the discussed characters to keep (and even, to draw) people away from
using them as long as they display well, and establishing new practice-proof usage
protocols, including gateways to legacy protocols.
Hopefully,
Marcel
More information about the Unicode
mailing list