Superscript and Subscript Characters in General Use / Re: French Superscript Abbreviations Fit Plain Text Requirements
charupdate at orange.fr
Mon Jan 23 03:30:17 CST 2017
Gladly this thread comes now to a far better and very useful result.
A set of Unicode super- and subscripts are proven to be already promoted by Microsoft
in a fully validated way. From this we can expand to promote the use of a set of
Latin superscript letters. Connectedly, Microsoftʼs position of unsupporting the
OpenType rendering properties of U+2044 FRACTION SLASH (at least in a Latin script
context in Edge) turns out to be a fairly user-frienly, practice-oriented option.
That helps, too, to get around of holding peopleʼs feet to the fire about U+2044.
On Wed, 28 Dec 2016 13:47:00 -0800, Asmus Freytag wrote:
> Mathematical notation is a good example of such a mixed case: while
> ordinary variables can be expressed in plain text with the help of
> mathematical alphabets, the proper display of formulas requires markup.
> Even Murray Sargent's plain text math is markup, albeit a very clever one
> that re-uses conventions used for the inline presentation of mathematical
> expression. (Where that is insufficient, it introduces additional
> conventions, clearly extraneous to the content, and hence markup).
Murray Sargentʼs Nearly Plain-Text Encoding of Mathematics (UnicodeMath) is in my
opinion a key gateway to the understanding of Unicode, and thus becomes a key point
in my communication about Unicode-supporting keyboard layouts. See version 3.1:
Thanks to Asmus Freytag for drawing our attention to it!
What makes this notation so important to this threadʼs issue, is in that it uses
Unicode superscripts and subscripts as a valid and parseable alternative to the
[La]TeX-style notation that uses markup ('^' and '_'), “since Unicode has a full set
of decimal subscripts and superscripts. As a practical matter, numeric subscripts
are typically entered using an underscore and the number followed by a space or
an operator” (p. 7).
These Unicode superscript and subscript characters are parseable and are converted
to formatted digits at build. Hence they are unambiguous, not random characters as
sometimes alleged. They “should be rendered the same way that scripts of the
corresponding script nesting level would be rendered.” (p. 18)
Although fractions are ordinarily written with ASCII digits and slash, U+2044 can
be used to get skewed fractions (p. 5) built up in Microsoft Word (where fractions
can also be formatted using the math features). Combining both schemes, the user
may feel free to write fractions using super/sub scripts around U+2044, as suggested
in the already cited wiki proposing to add a huge autocorrect list for quick input:
This is practice-oriented and user-friendly because relying only on the OpenType font
feature specified for U+2044 would dramatically restrict the number of usable fonts,
that in Latin script is traditionally several thousands, as opposed to complex scripts
for which HarfBuzz is primarily intended, where the number of available typefaces is
much smaller, so that full conversion to OpenType is feasible. So I think that the
correct rendering of U+2044 in HarfBuzz targets mainly these complex scripts. In
other scripts like Latin, the feature would then be a nice fall-off, that potentially
raises user expectations about professional (typographical) ligature rendering.
At the other end, for drafts and even “for simple documentation purposes”,
“plain-text linearly formatted mathematical expressions can be used ‘as is’” (p.29).
That can be extended to vulgar fractions in current text, and abbreviations.
This helps to understand that any font with inconsistent glyphs for Unicode subscript
and superscript digits is not Unicode conformant.
The same applies to superscript i and n (as mentioned in:
). These inconsistent fonts don’t conform to the Unicode Standard specifying that
there is no functional difference between those characters that have the word
SUPERSCRIPT in their name, and those that donʼt:
TUS 9.0, §7.8, p. 327:
| The superscript forms of the i and n letters can be found in the
| Superscripts and Subscripts block (U+2070..U+209F). The fact that the latter
| two letters contain the word “superscript” in their names instead of “modifier
| letter” is an historical artifact of original sources for the characters, and
| is not intended to convey a functional distinction in the use of these
| characters in the Unicode Standard.
Moreover, the Code Charts contain comment lines to these two characters, connecting
them to the set of Unicode superscript Latin letters named “MODIFIER LETTER”:
2071 SUPERSCRIPT LATIN SMALL LETTER I
* functions as a modifier letter
# <super> 0069
207F SUPERSCRIPT LATIN SMALL LETTER N
* functions as a modifier letter
# <super> 006E
Accordingly, the user can count on a whole small alphabet — except q, that has been
rejected arguing invented imaginary allegations on behalf of the UTC — displaying in
a consistent way in all complete, conformant fonts, with a running-text like layout
so far as the fonts have proportional advance width. To run a test, see example in:
Trying to conclude so far (please feel free to correct), I now believe and will
spread the word that following Microsoft — a user-friendly corporation eager to help
everybody make the most of Unicode — the users of any word processor and text editor
are welcome to use the Unicode repertoire as they need and like, while on the other
hand, the recommendations in TUS may be considered a mere official discourse for
encoding process management purposes, but with little through no real impact on
actual practice. Hence, National Bodies and user communities as well as developers
may issue usage recommendations of their own, to meet user expectations and propose
working methods additionally—or alternatively—to those provided by the Standard.
More information about the Unicode