Superscript and Subscript Characters in General Use

Marcel Schneider charupdate at orange.fr
Fri Jan 6 13:02:25 CST 2017


Another important point for the modifier letter fallbacks to work (if supported), 
would be that fonts support diacritics combined with modifier small letters. 
In 2014 I requested the superscript small 'è' (not noticing that the intended 
abbreviation is incorrect), but encoding new characters like this one would be 
useless because it is decomposable, and out of date since the deadline is long past.
But the superscript 'é' that Iʼve recently mentioned is still used (in 'S^{té}' for 
'Société' [Corporation], different from 'Sᵗᵉ' which is the abbreviation of 'Sainte' 
[Saint, feminine]); and in Spanish, superscript 'í' is used, Denis Jacquerye noted 
while pointing the need of working with—and enhancing support of—higher level 
protocols. [4]

Higher level protocols will still stay recommended as the standard high-end solution, 
while the use of modifier letters could get the status of an alternate fallback. 
Once it has it, modifier letter small q could be encoded and the whole set updated 
at font level for support of combining diacritics, while software may add two commands 
for round-trip conversion between modifier letters and superscript baseline letters, 
and probably between preformatted fractions and formatted fractions; Iʼm quite sure 
that all this is possible right now in VBA.

Iʼve added some more references to my previous mail with respect to past yearʼs 
discussion of formatting variation selectors. As there was a typo and missing line 
breaks (symptomatic of not using any spell checker and of editing the layout by 
hand in a text editor), I feel the need of letting follow the corrected version 
below. 

Best regards,

Marcel

On Fri, 6 Jan 2017 00:21:29 -0800, Asmus Freytag wrote:
> 
> On 1/5/2017 9:42 PM, Marcel Schneider wrote:
> > 
> > Nevertheless, 
> > the user might prioritize the stability of the document when it comes to plain text, 
> > and he could be interested in a better-looking display of letters that elsewhere 
> > should be superscripted. Here, the modifier letters could be a ready-to-use fallback
> 
> The use of such hacks is destabilizing to any efforts to systematically format superscripts 
> across a document.

That supposes a rich text environment. The orthographical correctness of some 
languages, among which French, requires traditionally either a rich text environment 
or some in-line markup like TeX (at the expense of direct usability, i.e. without 
a LaTeX converter). That is limit non-conformant to the design principles of Unicode. 
As I understand them, Unicode provides all characters that are needed to correctly 
spell any language. This goal remains unreached as long as the orthography of some 
languages cannot be entirely achieved without relying on formatting markup. (Iʼm 
aware that complex scripts require hinted fonts for glyph reordering and glyph 
substitution, but this still is plain text.) 

The superscripting of abbreviation endings belongs to another level of correctness 
than the arbitrary stress as expressed with italics, bold, underline (obsolete in 
this use), extra letter spacing (German, rather old-style), capitalization, or 
extra acute accents as in Dutch. 

This is why Karl Pentzlin [1] cited ‘Biblio^{que}’ vs “Biblioque”, where the latter 
is “no valid French word.” 

>From this it becomes now clear that Alastair Houghtonʼs [2] suggestion of encoding 
a superscript variant selector, would meet this requirement and is therefore not 
to be confused with the first step towards making Unicode support rich text. This 
was indeed the traditional argument opposed to previous similar suggestions. [3]

Following the actual scheme, French and a few other languages cannot be written 
in a correct orthography when the environment is plain text. That seems to me 
hard to accept.

> Text fonts may not support them, because for "ordinary" text, by Unicode's
> recommendation, one would use ordinary letters / digits with superscript markup.

A text font that does not support all modifier letters has less of a text font than 
of a title font. Ornamental fonts are produced in such a variety that completing 
them is/was economically unfeasible. Iʼm considering this statement rather in the 
past tense, because diacriticized letters are already (on request) automatically 
generated and added to the font at creation. If automatic superscripting shouldnʼt 
already be implemented, it will be soon, I suppose. So more and more (new and 
updated) fonts will support them. But wherever they arenʼt, a _Convert modifier 
letters to superscript_ feature (or an equivalent macro command) ought to be able 
to make the text conformant to legacy handling.

> So, by using these hacks, anytime a document is re-formatted with a different font style,
> you are in danger of either losing these to boxes, or to be faced with random font styles.

Yes, people should always be aware that the use of modifier letters has its downside, 
as has the use of superscripted baseline letters. I currently write e-mails (like 
this one) in a text editor (Notepad++). Several features I use here, are IMO missing 
in all e-mail clients, as column editing, line reordering, and so on. So I appreciate 
to be able to spell correctly in plain text, without sloppy fallbacks (i.e. baseline 
fallbacks for superscript). Itʼs a matter of making the most of the existing charset.
I believe that modifier letter fallbacks are very functional. When I paste them into 
an HTML mail form, the display is always correct and doesnʼt need to add superscript 
by hand in the whole mail. Furthermore, I can even use superscript in the subject.

> If you don't think that is a real problem: some (many) character pickers will insert font+code point into
> an application. These font bindings often survive and suddenly your text, when read on a different
> computer looks like a ransom note, just because the new machine has a new "default" font, and 
> that is applied to all letters that don't have a specific font binding.

Basically this is a good scheme, because character pickers typically are used for 
symbols. There are also two kinds: local, and online. I sometimes pick in the 
full-size PDF of the Code Charts. Theyʼre the best character picker IMO.

> Some font pickers are "stupid" enough to do this for simple accented code points that would have
> been in the currently selected font anyway.

Thatʼs really bad. I know that some people are writing documents by picking accented 
letters in the special characters dialog. I can figure out that some other people 
may use an online picker instead, partly because the word processor theyʼre using 
may be a web-app. Anyhow, this is very unefficient. The reason may be that one 
often thinks either that a keyboard cannot be completed, or that completing a 
keyboard would make it unusable, or hard to use, or full of stickers. Hereʼs one 
main challenge of keyboard layout development.

> Your suggestions will just add to these problems.
> If editing in a rich text environment, work in rich text. And then lean on implementers to get
> export correct to other rich text formats....

I really worked nearly all the time in a rich text environment, and I added plenty 
of autocorrections to speed up writing. Today, I work most of the time in plain 
text. I donʼt use LaTeX, but I know that this is easily exported to many other 
formats. PDF is a main target format. Most of the drawbacks start when the reader 
wishes to copy-paste some lines of a (basically searchable) PDF either to rich text 
or to plain text… but that is not the issue here.

I hope that my future recommendations will solve more problems than theyʼll create!

Marcel

[1] Karl Pentzlinʼs MODIFIER LETTER SMALL Q proposal: 
http://www.unicode.org/L2/L2010/10230-modifier-q.pdf

[2] Alastair Houghtonʼs SUPERSCRIPT/SUBSCRIPT variant selectors suggestion:
http://www.unicode.org/mail-arch/unicode-ml/y2017-m01/0016.html

[3] Re: Why incomplete subscript/superscript alphabet ? a.lukyanov
http://www.unicode.org/mail-arch/unicode-ml/y2016-m10/0001.html
Re: Why incomplete subscript/superscript alphabet ? Leonardo Boiko 
http://www.unicode.org/mail-arch/unicode-ml/y2016-m10/0013.html
Re: Why incomplete subscript/superscript alphabet ? Jukka K. Korpela 
http://www.unicode.org/mail-arch/unicode-ml/y2016-m10/0014.html
Re: Why incomplete subscript/superscript alphabet ? Steve Swales 
http://www.unicode.org/mail-arch/unicode-ml/y2016-m10/0015.html
Re: Why incomplete subscript/superscript alphabet ? Neil Harris 
http://www.unicode.org/mail-arch/unicode-ml/y2016-m10/0017.html

[4] Re: Why incomplete subscript/superscript alphabet ? Denis Jacquerye 
http://www.unicode.org/mail-arch/unicode-ml/y2016-m10/0037.html



More information about the Unicode mailing list