Re: Request for Inclusion of Subscript for the English Letter “y”

Wed Nov 6 00:40:51 CST 2024

On 11/5/2024 12:31 PM, Phil Smith III via Unicode wrote:
>
> I assume you’ve seen 
> https://en.wikipedia.org/wiki/Unicode_subscripts_and_superscripts, 
> which discusses what is and isn’t available as super/subscripts 
> (henceforth “ss”) in Unicode. That surprised me—I would have thought 
> that ss were markup, not characters, so there’s more of it implemented 
> already than I’d expected.
>
The consensus that emerged over the first several decades of encoding 
Unicode treats these forms somewhat ambiguously.

In mathematical notation, any character can be a super or subscript, and 
so you find multiple scripts and symbols, but with not limit, in 
principle as to what additional characters some specialty may adopt and 
super/subscript for some purpose. And you have things like subscripts on 
subscripts and similarly complex layouts. In that context it is 
definitely appropriate to treat subscripting as a generic operation and 
to not try to encode some subset of possible results of that operation. 
You could never encode all forms that are ever used (or available for 
use) in mathematical notation, so for that purpose, encoding any further 
explicit subscript forms doesn't help.

There is generic use of (mostly) superscript numbers in text, for things 
like footnotes. These are also best done as generic operations (via 
styles), particularly as they relate to document structure that already 
suggests the use of plain text.

There are other notations, mainly phonetic, that have super/subscript 
forms but do not//need recursive subscripting or all the other 
interesting features of mathematical layout and formatting. In many of 
them, the super or subscript form often acts pretty much like any other 
letter in the notation, except for its shape. Common to these notations 
is that there's a fixed set of such shapes; they don't even cover a full 
basic alphabet; (that Unicode is getting close to having a full alphabet 
is from overlapping use).

For these cases there's a benefit in being able to have a robust plain 
text representation, so that "words" aren't required to use styling to 
be understood. That's the driving case behind encoding these forms. 
Ultimately the realization was that a universal character encoding could 
not be "one-size-fits-all" when it comes to serve wildly diverging 
styles of usage.

Another example of this dichotomy again involves the distinction between 
mathematics and text. In text, the plain text does not carry font 
information and it is fully acceptable to render the result in any font 
that supports the letters in question. That even goes for styles that 
aren't fully readable to everyday users. For example, text in the Latin 
script can be rendered using a Fraktur font that many people may have 
difficulties deciphering or reading fluently. No matter, you haven't 
changed the meaning of the text by doing that. And the selection of 
possible fonts is near infinite. Some font variations are generic enough 
that they can be applied to many scripts, others may be limited in 
practice to some specific alphabet.

In math notation, you have the situation that mathematicians have used 
the contrast between different font shapes to carry meaning. In some 
conventions, Fraktur shapes are used to indicate that a variable is a 
vector and not a scalar, for example. There are a handful of font styles 
that are used in this way, a fairly fixed set, and usually covering a 
limited set of characters as well. Because the operation is not fully 
generic, it is possible to cover it with explicitly encoded characters. 
At that point, there's the benefit of preserving that distinction in 
plain text.

In fact, it's possible this way, to render a very large subset of 
mathematical notation in an (almost) plain text form. Incidentally 
something not that dissimilar from the concept of markdown, a plain text 
stream with a few chosen conventions, in the math case, about the use of 
parens, plus dedicating some character to function as subscript and 
superscript "operator". (All the other math operators, such as integrals 
or radical signs, trigger their own formatting, thus obviating the need 
for encoding that explicitly).

Having the character for all shape variants used for variables encoded 
directly makes this near plaintext form very powerful. Again, what is a 
useful generic situation for ordinary text isn't as workable for a 
notational system and vice versa. They emerging insight was that Unicode 
should strive to make reasonable accommodations, but in a way that 
focused on the central needs for and features of each of them.

If you look just at the encoding though, you come away with a sense of 
apparent duplication and also seeming incompleteness: the additions for 
phonetic notations will never cover the generic use of math, while the 
few styled alphabets for math do nothing for general text use. The key 
is to recognize which notation or use case is supported by what, and 
then things make a whole lot more sense.

A./
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://corp.unicode.org/pipermail/unicode/attachments/20241105/39da87c8/attachment-0001.htm>