Re: Request for Inclusion of Subscript for the English Letter “y”
Asmus Freytag
asmusf at ix.netcom.com
Wed Nov 6 00:40:51 CST 2024
On 11/5/2024 12:31 PM, Phil Smith III via Unicode wrote:
>
> I assume you’ve seen
> https://en.wikipedia.org/wiki/Unicode_subscripts_and_superscripts,
> which discusses what is and isn’t available as super/subscripts
> (henceforth “ss”) in Unicode. That surprised me—I would have thought
> that ss were markup, not characters, so there’s more of it implemented
> already than I’d expected.
>
The consensus that emerged over the first several decades of encoding
Unicode treats these forms somewhat ambiguously.
In mathematical notation, any character can be a super or subscript, and
so you find multiple scripts and symbols, but with not limit, in
principle as to what additional characters some specialty may adopt and
super/subscript for some purpose. And you have things like subscripts on
subscripts and similarly complex layouts. In that context it is
definitely appropriate to treat subscripting as a generic operation and
to not try to encode some subset of possible results of that operation.
You could never encode all forms that are ever used (or available for
use) in mathematical notation, so for that purpose, encoding any further
explicit subscript forms doesn't help.
There is generic use of (mostly) superscript numbers in text, for things
like footnotes. These are also best done as generic operations (via
styles), particularly as they relate to document structure that already
suggests the use of plain text.
There are other notations, mainly phonetic, that have super/subscript
forms but do not//need recursive subscripting or all the other
interesting features of mathematical layout and formatting. In many of
them, the super or subscript form often acts pretty much like any other
letter in the notation, except for its shape. Common to these notations
is that there's a fixed set of such shapes; they don't even cover a full
basic alphabet; (that Unicode is getting close to having a full alphabet
is from overlapping use).
For these cases there's a benefit in being able to have a robust plain
text representation, so that "words" aren't required to use styling to
be understood. That's the driving case behind encoding these forms.
Ultimately the realization was that a universal character encoding could
not be "one-size-fits-all" when it comes to serve wildly diverging
styles of usage.
Another example of this dichotomy again involves the distinction between
mathematics and text. In text, the plain text does not carry font
information and it is fully acceptable to render the result in any font
that supports the letters in question. That even goes for styles that
aren't fully readable to everyday users. For example, text in the Latin
script can be rendered using a Fraktur font that many people may have
difficulties deciphering or reading fluently. No matter, you haven't
changed the meaning of the text by doing that. And the selection of
possible fonts is near infinite. Some font variations are generic enough
that they can be applied to many scripts, others may be limited in
practice to some specific alphabet.
In math notation, you have the situation that mathematicians have used
the contrast between different font shapes to carry meaning. In some
conventions, Fraktur shapes are used to indicate that a variable is a
vector and not a scalar, for example. There are a handful of font styles
that are used in this way, a fairly fixed set, and usually covering a
limited set of characters as well. Because the operation is not fully
generic, it is possible to cover it with explicitly encoded characters.
At that point, there's the benefit of preserving that distinction in
plain text.
In fact, it's possible this way, to render a very large subset of
mathematical notation in an (almost) plain text form. Incidentally
something not that dissimilar from the concept of markdown, a plain text
stream with a few chosen conventions, in the math case, about the use of
parens, plus dedicating some character to function as subscript and
superscript "operator". (All the other math operators, such as integrals
or radical signs, trigger their own formatting, thus obviating the need
for encoding that explicitly).
Having the character for all shape variants used for variables encoded
directly makes this near plaintext form very powerful. Again, what is a
useful generic situation for ordinary text isn't as workable for a
notational system and vice versa. They emerging insight was that Unicode
should strive to make reasonable accommodations, but in a way that
focused on the central needs for and features of each of them.
If you look just at the encoding though, you come away with a sense of
apparent duplication and also seeming incompleteness: the additions for
phonetic notations will never cover the generic use of math, while the
few styled alphabets for math do nothing for general text use. The key
is to recognize which notation or use case is supported by what, and
then things make a whole lot more sense.
A./
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://corp.unicode.org/pipermail/unicode/attachments/20241105/39da87c8/attachment-0001.htm>
More information about the Unicode
mailing list