Superscript and Subscript Characters in General Use

Marcel Schneider charupdate at
Fri Jan 6 08:30:19 CST 2017

On Fri, 6 Jan 2017 00:21:29 -0800, Asmus Freytag wrote:
> On 1/5/2017 9:42 PM, Marcel Schneider wrote:
> > 
> > Nevertheless, 
> > the user might prioritize the stability of the document when it comes to plain text, 
> > and he could be interested in a better-looking display of letters that elsewhere 
> > should be superscripted. Here, the modifier letters could be a ready-to-use fallback
> The use of such hacks is destabilizing to any efforts to systematically format superscripts 
> across a document.

That supposes a rich text environment. The orthographical correctness of some 
languages, among which French, requires traditionally either a rich text environment 
or some in-line markup like TeX (at the expense of direct usability, i.e. without 
a LaTeX converter). That is limit non-conformant to the design principles of Unicode. 
As I understand them, Unicode provides all characters that are needed to correctly 
spell any language. This goal remains unreached as long as the orthography of some 
languages cannot be entirely achieved without relying on formatting markup. (Iʼm 
aware that complex scripts require hinted fonts for glyph reordering and glyph 
substitution, but this still is plain text.) 

The superscripting of abbreviation endings belongs to another level of correctness than the arbitrary stress as expressed with italics, bold, underline 
(obsolete in this use), extra letter spacing (German, rather old-style), capitalization, or extra acute accents as in Dutch. 

This is why Karl Pentzlin [1] cited ‘Biblio^{que}’ vs “Biblioque”, where the latter 
is “no valid French word.” 

>From this it becomes now clear that Alastair Houghtonʼs suggestion [2] of encoding 
a superscript variant selector, would meet this requirement and is therefore not 
to be confused with the first step towards making Unicode support rich text.

Saying it loud: The fact that French and a few other languages cannot be written 
in a correct orthography when the environment is plain text, seems to me hard to 

> Text fonts may not support them, because for "ordinary" text, by Unicode's
> recommendation, one would use ordinary letters / digits with superscript markup.

A text font that does not support all modifier letters has less of a text font than 
of a title font. Ornamental fonts are produced in such a variety that completing 
them is/was economically unfeasible. Iʼm considering this statement rather in the 
past tense, because diacriticized letters are already (on request) automatically 
generated and added to the font at creation. If automatic superscripting shouldnʼt 
already be implemented, it will be soon, I suppose. So more and more (new and 
updated) fonts will support them. But wherever they arenʼt, a _Convert modifier 
letters to superscript_ feature (or an equivalent macro command) ought to be able 
to make the text conformant to legacy handling.

> So, by using these hacks, anytime a document is re-formatted with a different font style,
> you are in danger of either losing these to boxes, or to be faced with random font styles.

Yes, people should always be aware that the use of modifier letters has its downside, 
as has the use of superscripted baseline letters. I currently write e-mails (like 
this one) in a text editor (Notepad++). Several features I use here, are IMO missing 
in all e-mail clients, as column editing, line reordering, and so on. So I appreciate 
to be able to spell correctly in plain text, without sloppy fallbacks (i.e. baseline 
fallbacks for superscript). Itʼs a matter of making the most of the exsisting charset.
I believe that modifier letter fallbacks are very functional. When I paste them into 
an HTML mail form, the display is always correct and doesnʼt need to add superscript 
by hand in the whole mail. Furthermore, I can even use superscript in the subject.

> If you don't think that is a real problem: some (many) character pickers will insert font+code point into
> an application. These font bindings often survive and suddenly your text, when read on a different
> computer looks like a ransom note, just because the new machine has a new "default" font, and 
> that is applied to all letters that don't have a specific font binding.

Basically this is a good scheme, because character pickers typically are used for 
symbols. There are also two kinds: local, and online. I sometimes pick in the 
full-size PDF of the Code Charts. Theyʼre the best character picker IMO.

> Some font pickers are "stupid" enough to do this for simple accented code points that would have
> been in the currently selected font anyway.

Thatʼs really bad. I know that some people are writing documents by picking accented 
letters in the special characters dialog. I can figure out that some other people 
may use an online picker instead, partly because the word processor theyʼre using 
may be a web-app. Anyhow, this is very unefficient. The reason may be that one 
often thinks either that a keyboard cannot be completed, or that completing a 
keyboard would make it unusable, or hard to use, or full of stickers. Hereʼs one 
main challenge of keyboard layout development.

> Your suggestions will just add to these problems.
> If editing in a rich text environment, work in rich text. And then lean on implementers to get
> export correct to other rich text formats....

I really worked nearly all the time in a rich text environment, and I added plenty 
of autocorrections to speed up writing. Today, I work most of the time in plain 
text. I donʼt use LaTeX, but I know that this is easily exported to many other 
formats. PDF is a main target format. Most of the drawbacks start when the reader 
wishes to copy-paste some lines of a (basically searchable) PDF either to rich text 
or to plain text… but that is not the issue here.

I hope that my future recommendations will solve more problems than theyʼll create!


[1] Karl Pentzlinʼs MODIFIER LETTER SMALL Q proposal:

[2] Alastair Houghtonʼs SUPERSCRIPT/SUBSCRIPT variant selectors suggestion:

More information about the Unicode mailing list