Superscript and Subscript Characters in General Use

Marcel Schneider charupdate at orange.fr
Wed Jan 4 23:56:58 CST 2017


On Wed, 04 Jan 2017 12:20:14 -0700, Doug Ewell wrote:
> 
> Marcel Schneider wrote:
> 
> >> I don't understand the relevance to vulgar fractions.
> >
> > Vulgar fractions represented using super- and subscript digits around
> > the FRACTION SLASH U+2044
> 
> Don't do that.
> 
> The fact that someone, even a Microsoft MVP, posted an article about
> this glyph hack does not make it a good idea.

I found it a good idea long before I found and read the article.[1] It is very 
coherent, and seemed to me the best way to make sense of the fraction slash in 
a character encoding standard that does things seriously. Since Iʼve read the 
article, Iʼm glad that a Microsoft MVP worked out solutions to help people who 
have incomplete keyboard layouts. Several readers were so kind as to comment on 
the usefulness of the article and the shared data.

> It's kind of like making a
> grinning frog or caterpillar out of Telugu letters.

I donʼt think that Telugu art and ASCII art could be compared to writing 
numbers with fractions made of superscripts and subscripts. Perhaps there is 
a difference between Telugu art and ASCII art in that, ASCII is more common, 
but the availability of super-/subscript Western Arabic digits should not be 
compared to the availability of a rather uncommon script. 

> 
> > What I complain of as not mentioned in the Standard, is that U+2044
> > can be used with superscript and subscript digits, rather than ASCII
> > digits.
> 
> Almost any character(s) in Unicode "can be" used with almost any other.
> You can surround U+2044 with emoji if you like. That doesn't mean you
> should.

Not to represent vulgar fractions in a legible way. Superscript and subscript 
digits are particular in that, they have compatibility mappings to ASCII digits, 
so that they are not only human readable, but machine readable. See TUS §22.4 [2].

As of “readability for the human reader” (NamesList, header), vulgar fractions 
represented using superscripts-FRACTION SLASH-subscripts have also the advantage 
of being stable across environments, unless some characters are not supported, 
in which case they can be parsed and replaced with formatted ASCII-based fractions, 
e.g. before the text is pasted into an ANSI-encoded form (that replaces with '?').
And they meet user expectations. Preformatted fractions are so demanded that the 
most frequent of them were encoded in early standards and included in national 
keyboard layouts. They entered Unicode for roundtrip compatibility [3]. That means, 
this is not the specific Unicode way of representing fractions, obviously because 
of the limitation of the number of those fractions. Now, the common denominator of 
the Unicode scheme and the user expectations is to represent vulgar fractions using 
preformatted super-/subscripts along with the—accurately kerning—FRACTION SLASH.
Therefore (again) that has been implemented in fonts like Arial Unicode MS.

The stability of this representation scheme prevents content corruption (see 
the counter-examples in TUS below, where the PDF tool used arbitrary characters 
mapped to special fonts; though that is another—already discussed—issue [3]).

I suggest that the specification of the fraction slash in TUS [4] be updated. 
It remained roughly unchanged since version 2.0 (the other one that Iʼve checked). 
First, U+2044 should be used where applicable (actually there is still U+002F).
There should be *two* “standard form[s] of a fraction built using the fraction 
slash”. Further we read that “the displaying software is […] mapping the fraction 
to a unit”. Does that mean that the preformatted fraction is substituted if 
available? Or should it read ‘_formatting_ the fraction _as_ a unit’?
I note, too, that typically the software waits for the digit-slash-digit sequence 
to be selected and fraction formatting being applied at request, so that this 
could eventually be mentioned, given that the fraction slash is even more 
uncommon on keyboards than the complete range of super- and subscript digits.

Regards,
Marcel

[1] Styled Fractions in Windows, Created by Jeeped, July 18, 2013, MVP, Wiki Author:
https://answers.microsoft.com/en-us/msoffice/wiki/msoffice_word-mso_other/styled-fractions-in-windows/4a07d5fa-2484-4e39-b1f3-70bb3eb0c332

[2] TUS 9.0, §22.4, p. 786:
| 
| Parsing of Superscript and Subscript Digits. In the Unicode Character Database, superscript
| and subscript digits have not been given the General_Category property value
| Decimal_Number (gc=Nd), so as to prevent expressions like 23 from being interpreted like
| 23 by simplistic parsers. This should not be construed as preventing more sophisticated
| numeric parsers, such as general mathematical expression parsers, from correctly identifying
| these compatibility superscript and subscript characters as digits and interpreting them
| appropriately. See also the discussion of digits in Section 22.3, Numerals.
| 
http://www.unicode.org/versions/Unicode9.0.0/ch22.pdf#G46374

[3] TUS 9.0, §22.3, p. 784:
| 
| Fractions
| 
| The Number Forms block (U+2150..U+218F) contains a series of vulgar fraction characters,
| encoded for compatibility with legacy character encoding standards. These characters
| are intended to represent both of the common forms of vulgar fractions: forms with a
| right-slanted division slash, such as G, as shown in the code charts, and forms with a horizontal
| division line, such as H, which are considered to be alternative glyphs for the same
| fractions, as shown in Figure 22-8. A few other vulgar fraction characters are located in the
| Latin-1 block in the range U+00BC..U+00BE.
| 
| Figure 22-8. Alternate Forms of Vulgar Fractions
| 
| G H
| 
| The unusual fraction character, U+2189 vulgar fraction zero thirds, […]
| 
| The vulgar fraction characters are given compatibility decompositions using U+2044 “/”
| fraction slash. Use of the fraction slash is the more generic way to represent fractions in
| text; it can be used to construct fractional number forms that are not included in the collections
| of vulgar fraction characters. For more information on the fraction slash, see “Other
| Punctuation” in Section 6.2, General Punctuation.
| 
http://www.unicode.org/versions/Unicode9.0.0/ch22.pdf#G46039


[4] TUS 9.0, §6.2, p. 277:
| 
| Fraction Slash. U+2044 fraction slash is used between digits to form numeric fractions,
| such as 2/3 and 3/9. The standard form of a fraction built using the fraction slash is defined
| as follows: any sequence of one or more decimal digits (General Category = Nd), followed
| by the fraction slash, followed by any sequence of one or more decimal digits. Such a fraction
| should be displayed as a unit, such as ¾ or !. The precise choice of display can depend
| on additional formatting information.
| 
| If the displaying software is incapable of mapping the fraction to a unit, then it can also be
| displayed as a simple linear sequence as a fallback (for example, 3/4). If the fraction is to be
| separated from a previous number, then a space can be used, choosing the appropriate
| width (normal, thin, zero width, and so on). For example, 1 + thin space + 3 + fraction
| slash + 4 is displayed as 1¾.
| 
http://www.unicode.org/versions/Unicode9.0.0/ch06.pdf#G2000



More information about the Unicode mailing list