Superscript and Subscript Characters in General Use

Mon Jan 9 06:42:51 CST 2017

On Fri, 6 Jan 2017 06:42:14 +0100 (CET), I wrote:
> […] Here, the modifier letters could be a ready-to-use fallback. 
> Converting them to formatted baseline letters could be achieved with a macro in VBA. 
> 
> Couldnʼt this be included in the next Office version as an out-of-the-box feature? 
http://www.unicode.org/mail-arch/unicode-ml/y2017-m01/0036.html

A conversion macro is now ready for Notepad++, that uses regexes and adds TeX markup.
To get around security issues this time, it is attached below. Thanks Unicode for 
forwarding. The XML file has some explanations in the header and can be manually 
added to the user storage file of the software. Macros for LibreOffice and for 
Office in VBA are in project but I cannot currently write them.

Along with this, I feel compelled to submit three detail issues around the topic:

(1) Interpretation of sequences: The fraction slash is specified to be interpreted 
as a sort of format control, and the software is supposed to either format the 
fraction, or to generate a linear fallback that in TUS (9.0, §6.2, p. 277) shows 
up with a character substitution, eventually to emulate a glyph substitution for 
U+2044 by a glyph similar to U+002F. But the software isnʼt meant to perform a 
glyph substitution as a fallback for another glyph substitution. 

Is that process conformant to this requirement: “A process shall not assume that it 
is required to interpret any particular coded character sequence.” (TUS 9.0, §3.2, p. 80) 
Probably Iʼm missing some clues here.

(2) Font conformance: Most fonts seem unhinted so that they cannot substitute 
numerator and denominator glyphs, and the digits remain normal size. Nevertheless, 
U+2044 FRACTION SLASH kerns so much that it overlaps many of the adjacent digits, 
typically 3, 6, 8, 9, 0. Therefore and to get neat fractions, the user may work in 
rich text and use the generic super-/subscript formatting. The Unicode Core 
Specification gives hints for implementers to automate this process in another way, 
while leaving the door open to an unformatted fallback throwout. Some proportional 
fonts however have an unkerning fraction slash. These inconsistencies in support 
and display baffle me.

Is there any place in the Unicode Standard where the kerning is specified? I 
believe that there isnʼt. So which design decision should be preferred? I think it 
would be the kerning option.

(3) Variation selectors: Today, many characters are given variation selector 
sequences, so that I believe that the idea could be maintained that letters and 
digits deserve some information about whether they are a part of abbreviations, 
or of a vulgar fraction (and then, whether they are numerators or denominators). 
While the latter can be catered for by the glyph substitution mechanism triggered 
by the presence of U+2044, the former would require an *ABBREVIATION INDICATOR as 
it has already been suggested, an invisible formatting control. This however should 
have been proposed twenty years ago by the mainly concerned communities. Adding and 
implementing it today would perhaps be inefficient. The more as the concerned 
sequences are mainly found in Latin script, where thanks to phonetics, superscript 
forms are already available.

After having completed this, I canʼt help wondering about the dynamics that show up 
in this and other related threads over the years, and particularly these past days. 
While rarely anybody takes offence of the misuse of the DEGREE SIGN as a kind of 
superscript 'o', many objections are raised whenever people dare to grab the small 
modifier letters on the keyboard and type them in their text editors, e-mail clients 
and webmail forms. What is the matter about this practice? Proper handling of such 
text files turns out to be quite easy and straigtforward, and round-trip conversion 
is at reach. For once here is a draft format that in some circumstances can even 
display in a finish-like look. 

For what reason should that be strongly discouraged and prohibited? This is the 
more surprising as it would restore the missing equality of the worldʼs languages 
with respect to plain text. Is this still overstating that principle?

Regards,

Marcel
-------------- next part --------------
A non-text attachment was scrubbed...
Name: shortcuts.xml
Type: text/xml
Size: 22885 bytes
Desc: not available
URL: <http://unicode.org/pipermail/unicode/attachments/20170109/dca4318f/attachment.xml>