Superscript and Subscript Characters in General Use / Re: French Superscript Abbreviations Fit Plain Text Requirements

Marcel Schneider charupdate at orange.fr
Mon Jan 23 03:30:17 CST 2017


Gladly this thread comes now to a far better and very useful result.
A set of Unicode super- and subscripts are proven to be already promoted by Microsoft 
in a fully validated way. From this we can expand to promote the use of a set of 
Latin superscript letters. Connectedly, Microsoftʼs position of unsupporting the 
OpenType rendering properties of U+2044 FRACTION SLASH (at least in a Latin script 
context in Edge) turns out to be a fairly user-frienly, practice-oriented option. 
That helps, too, to get around of holding peopleʼs feet to the fire about U+2044.

On Wed, 28 Dec 2016 13:47:00 -0800, Asmus Freytag wrote:
[…]
> 
> Mathematical notation is a good example of such a mixed case: while 
> ordinary variables can be expressed in plain text with the help of 
> mathematical alphabets, the proper display of formulas requires markup. 
> Even Murray Sargent's plain text math is markup, albeit a very clever one 
> that re-uses conventions used for the inline presentation of mathematical 
> expression. (Where that is insufficient, it introduces additional 
> conventions, clearly extraneous to the content, and hence markup).
> 
http://www.unicode.org/mail-arch/unicode-ml/y2016-m12/0119.html

Murray Sargentʼs Nearly Plain-Text Encoding of Mathematics (UnicodeMath) is in my 
opinion a key gateway to the understanding of Unicode, and thus becomes a key point 
in my communication about Unicode-supporting keyboard layouts. See version 3.1:
http://www.unicode.org/notes/tn28/UTN28-PlainTextMath-v3.1.pdf

Thanks to Asmus Freytag for drawing our attention to it!

What makes this notation so important to this threadʼs issue, is in that it uses 
Unicode superscripts and subscripts as a valid and parseable alternative to the 
[La]TeX-style notation that uses markup ('^' and '_'), “since Unicode has a full set 
of decimal subscripts and superscripts. As a practical matter, numeric subscripts 
are typically entered using an underscore and the number followed by a space or 
an operator” (p. 7).

These Unicode superscript and subscript characters are parseable and are converted 
to formatted digits at build. Hence they are unambiguous, not random characters as 
sometimes alleged. They “should be rendered the same way that scripts of the 
corresponding script nesting level would be rendered.” (p. 18)

Although fractions are ordinarily written with ASCII digits and slash, U+2044 can 
be used to get skewed fractions (p. 5) built up in Microsoft Word (where fractions 
can also be formatted using the math features). Combining both schemes, the user 
may feel free to write fractions using super/sub scripts around U+2044, as suggested 
in the already cited wiki proposing to add a huge autocorrect list for quick input:
https://answers.microsoft.com/en-us/msoffice/wiki/msoffice_word-mso_other/styled-fractions-in-windows/4a07d5fa-2484-4e39-b1f3-70bb3eb0c332

This is practice-oriented and user-friendly because relying only on the OpenType font 
feature specified for U+2044 would dramatically restrict the number of usable fonts, 
that in Latin script is traditionally several thousands, as opposed to complex scripts 
for which HarfBuzz is primarily intended, where the number of available typefaces is 
much smaller, so that full conversion to OpenType is feasible. So I think that the 
correct rendering of U+2044 in HarfBuzz targets mainly these complex scripts. In 
other scripts like Latin, the feature would then be a nice fall-off, that potentially 
raises user expectations about professional (typographical) ligature rendering.

At the other end, for drafts and even “for simple documentation purposes”, 
“plain-text linearly formatted mathematical expressions can be used ‘as is’” (p.29). 
That can be extended to vulgar fractions in current text, and abbreviations.

This helps to understand that any font with inconsistent glyphs for Unicode subscript 
and superscript digits is not Unicode conformant. 
The same applies to superscript i and n (as mentioned in:
http://www.unicode.org/mail-arch/unicode-ml/y2017-m01/0093.html
). These inconsistent fonts don’t conform to the Unicode Standard specifying that 
there is no functional difference between those characters that have the word 
SUPERSCRIPT in their name, and those that donʼt:

TUS 9.0, §7.8, p. 327:
| The superscript forms of the i and n letters can be found in the
| Superscripts and Subscripts block (U+2070..U+209F). The fact that the latter 
| two letters contain the word “superscript” in their names instead of “modifier 
| letter” is an historical artifact of original sources for the characters, and 
| is not intended to convey a functional distinction in the use of these 
| characters in the Unicode Standard.
http://www.unicode.org/versions/Unicode9.0.0/ch07.pdf#G24762

Moreover, the Code Charts contain comment lines to these two characters, connecting 
them to the set of Unicode superscript Latin letters named “MODIFIER LETTER”:

2071 SUPERSCRIPT LATIN SMALL LETTER I
* functions as a modifier letter
# <super> 0069
[…]
207F SUPERSCRIPT LATIN SMALL LETTER N
* functions as a modifier letter
# <super> 006E

Accordingly, the user can count on a whole small alphabet — except q, that has been 
rejected arguing invented imaginary allegations on behalf of the UTC — displaying in 
a consistent way in all complete, conformant fonts, with a running-text like layout 
so far as the fonts have proportional advance width. To run a test, see example in:
http://www.unicode.org/mail-arch/unicode-ml/y2017-m01/0093.html (again).

Trying to conclude so far (please feel free to correct), I now believe and will 
spread the word that following Microsoft — a user-friendly corporation eager to help 
everybody make the most of Unicode — the users of any word processor and text editor 
are welcome to use the Unicode repertoire as they need and like, while on the other 
hand, the recommendations in TUS may be considered a mere official discourse for 
encoding process management purposes, but with little through no real impact on 
actual practice. Hence, National Bodies and user communities as well as developers 
may issue usage recommendations of their own, to meet user expectations and propose 
working methods additionally—or alternatively—to those provided by the Standard.

Regards,
Marcel



More information about the Unicode mailing list