Superscript and Subscript Characters in General Use
charupdate at orange.fr
Mon Jan 9 06:42:51 CST 2017
On Fri, 6 Jan 2017 06:42:14 +0100 (CET), I wrote:
> […] Here, the modifier letters could be a ready-to-use fallback.
> Converting them to formatted baseline letters could be achieved with a macro in VBA.
> Couldnʼt this be included in the next Office version as an out-of-the-box feature?
A conversion macro is now ready for Notepad++, that uses regexes and adds TeX markup.
To get around security issues this time, it is attached below. Thanks Unicode for
forwarding. The XML file has some explanations in the header and can be manually
added to the user storage file of the software. Macros for LibreOffice and for
Office in VBA are in project but I cannot currently write them.
Along with this, I feel compelled to submit three detail issues around the topic:
(1) Interpretation of sequences: The fraction slash is specified to be interpreted
as a sort of format control, and the software is supposed to either format the
fraction, or to generate a linear fallback that in TUS (9.0, §6.2, p. 277) shows
up with a character substitution, eventually to emulate a glyph substitution for
U+2044 by a glyph similar to U+002F. But the software isnʼt meant to perform a
glyph substitution as a fallback for another glyph substitution.
Is that process conformant to this requirement: “A process shall not assume that it
is required to interpret any particular coded character sequence.” (TUS 9.0, §3.2, p. 80)
Probably Iʼm missing some clues here.
(2) Font conformance: Most fonts seem unhinted so that they cannot substitute
numerator and denominator glyphs, and the digits remain normal size. Nevertheless,
U+2044 FRACTION SLASH kerns so much that it overlaps many of the adjacent digits,
typically 3, 6, 8, 9, 0. Therefore and to get neat fractions, the user may work in
rich text and use the generic super-/subscript formatting. The Unicode Core
Specification gives hints for implementers to automate this process in another way,
while leaving the door open to an unformatted fallback throwout. Some proportional
fonts however have an unkerning fraction slash. These inconsistencies in support
and display baffle me.
Is there any place in the Unicode Standard where the kerning is specified? I
believe that there isnʼt. So which design decision should be preferred? I think it
would be the kerning option.
(3) Variation selectors: Today, many characters are given variation selector
sequences, so that I believe that the idea could be maintained that letters and
digits deserve some information about whether they are a part of abbreviations,
or of a vulgar fraction (and then, whether they are numerators or denominators).
While the latter can be catered for by the glyph substitution mechanism triggered
by the presence of U+2044, the former would require an *ABBREVIATION INDICATOR as
it has already been suggested, an invisible formatting control. This however should
have been proposed twenty years ago by the mainly concerned communities. Adding and
implementing it today would perhaps be inefficient. The more as the concerned
sequences are mainly found in Latin script, where thanks to phonetics, superscript
forms are already available.
After having completed this, I canʼt help wondering about the dynamics that show up
in this and other related threads over the years, and particularly these past days.
While rarely anybody takes offence of the misuse of the DEGREE SIGN as a kind of
superscript 'o', many objections are raised whenever people dare to grab the small
modifier letters on the keyboard and type them in their text editors, e-mail clients
and webmail forms. What is the matter about this practice? Proper handling of such
text files turns out to be quite easy and straigtforward, and round-trip conversion
is at reach. For once here is a draft format that in some circumstances can even
display in a finish-like look.
For what reason should that be strongly discouraged and prohibited? This is the
more surprising as it would restore the missing equality of the worldʼs languages
with respect to plain text. Is this still overstating that principle?
-------------- next part --------------
A non-text attachment was scrubbed...
Size: 22885 bytes
Desc: not available
More information about the Unicode