Superscript and Subscript Characters in General Use

Wed Jan 4 08:20:40 CST 2017

On Wed, 4 Jan 2017 00:36:38 -0500, Asmus Freytag wrote:
> 
> On 1/3/2017 4:24 PM, Marcel Schneider wrote:
> > On Tue, 3 Jan 2017 09:31:42 +0100, Christoph Päper wrote:
> >
> >>> Among the possibilities, you include Unicode subscripts.
> >> Just for the sake of completeness.
> > This tends to conclude that preformatted subscripts are really an option here.
> 
> Not so. You yourself quote this statement:
> 
> | Superscript modifier letters are intended for cases where the letters carry
> | a specific meaning, as in phonetic transcription systems, and are not
> | a substitute for generic styling mechanisms for superscripting of text,
> | as for footnotes, mathematical and chemical expressions, and the like.
> 
> It is clear that the uses that you advocate go against this intent.

This is because even complemented with UAXes and TRs, the Core Specifications 
cannot cover the whole practice. It seems that to stay inside reasonable limits, 
a significant number of usage cases have been left out, e.g. the mentioned use of 
plain text for styled custom vulgar fractions is a recognized practice, but stays 
persistently excluded from TUS. However, since the inclusion of this could consist 
in adding three lines to the text, there is more to it. Out of technical as well 
as ethical considerations, Unicode is unable to promote the discussed usages, but 
without strongly discouraging them. The snippet above [1] would be less harsh at 
the expense of some redundancy:

| Superscript modifier letters are intended for cases where the letters carry
| a specific meaning, as in phonetic transcription systems, and are not INTENDED 
| AS a substitute for generic styling mechanisms for superscripting of text,
| as for footnotes, mathematical and chemical expressions, and the like.

This resolves to the meaning that super-/subscripting in more or less ordinary 
text is outside the design principles of the Unicode Standard, because the 
boundary between the feasible and the unfeasible would be hard to draw, as shown 
with the recent example of the plain text database for chemical formulas. So to 
protect itself against the temptation of drawing that boundary (drawing it at risk 
of being subsequently compelled to move it further), Unicode *declares* those 
characters as being *intended for* special contexts, according to their very 
encoding history.

Trying to understand to what extent this principle is applicable, I note that 
the three cited examples currently imply much more formatting than superscripting. 
This is the case of structural formulae in _chemistry_, complex _mathematical_ 
expressions, and _footnote_ management and layout. By contrast, when itʼs only 
about super- or subscripting a few digits or Latin letters, markup and use 
of rich text may be considered overkill. And in the case of content that the 
reader may wish to copy-paste, things like the “16” affix of hex numbers should 
remain distinct. Hence, styling is only “the preferred means”, not the mandatory 
way to represent superscript letters or digits.[2] And this is tied to a /design/ 
principle of the Standard. I believe that /usage/ principles may diverge.

> 
> Therefore, your conclusion that this is "an option" is nothing more than 
> a very personal opinion on your part (and one that many people here would 
> consider misguided if presented as general recommendation).

Presenting this as general recommendation was indeed what I intended when starting 
the first thread of this discussion. Thanks to your and other subscribersʼ replies, 
Iʼve come to the insight that this cannot be recommended throughout, not in a 
general way. However, this not being "an option" remains still very unclear to me. 
As a result of prior discussions, we know that other list participants do use e.g. 
superscript characters in a more extensive way. 

I think there are two levels of action: 

(1) to encode new preformatted characters;
(2) to encourage re-use of already existing ones. 

I understand that Unicode is consistently reluctant in both, while ISO/IEC is able 
to do more in (1) given that they sometimes add (or remove) characters to(/from) 
the new repertoire, and National Bodies are in a position to do (2) through usage 
recommendations of their own. Let alone all the other people who may use or not 
use available preformatted characters for any purpose, eventually sharing the hint 
and—in the best case—the means to input them efficiently. 

Or am I missing something?

Given that the WG of the French standard keyboard is actually interested in getting 
encoded a new ordinal indicator (kind of 'ᵉ'), I feel the more urged to stay tuned, 
and to comment on subsequent e-mails, too.

Marcel

[1] TUS 9.0, §7.8, p. 327.
http://www.unicode.org/versions/Unicode9.0.0/ch07.pdf#G24762

[2] TUS 9.0, §22.4, p. 786.
http://www.unicode.org/versions/Unicode9.0.0/ch22.pdf#G42931