<!DOCTYPE html>
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
</head>
<body>
<div class="moz-cite-prefix">On 11/5/2024 12:31 PM, Phil Smith III
via Unicode wrote:<br>
</div>
<blockquote type="cite"
cite="mid:174201db2fc1$a7cc2890$f76479b0$@akphs.com">
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<meta name="Generator"
content="Microsoft Word 15 (filtered medium)">
<style>@font-face
{font-family:"Cambria Math";
panose-1:2 4 5 3 5 4 6 3 2 4;}@font-face
{font-family:Calibri;
panose-1:2 15 5 2 2 2 4 3 2 4;}@font-face
{font-family:Aptos;}p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0in;
font-size:12.0pt;
font-family:"Aptos",sans-serif;}a:link, span.MsoHyperlink
{mso-style-priority:99;
color:#467886;
text-decoration:underline;}span.EmailStyle18
{mso-style-type:personal-reply;
font-family:"Calibri",sans-serif;
color:#0A2F41;}.MsoChpDefault
{mso-style-type:export-only;
font-size:11.0pt;
mso-ligatures:none;}div.WordSection1
{page:WordSection1;}</style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]-->
<div class="WordSection1">
<p class="MsoNormal"><span
style="font-family:"Calibri",sans-serif;color:#0A2F41">I
assume you’ve seen <a
href="https://en.wikipedia.org/wiki/Unicode_subscripts_and_superscripts"
moz-do-not-send="true" class="moz-txt-link-freetext">https://en.wikipedia.org/wiki/Unicode_subscripts_and_superscripts</a>,
which discusses what is and isn’t available as
super/subscripts (henceforth “ss”) in Unicode. That
surprised me—I would have thought that ss were markup, not
characters, so there’s more of it implemented already than
I’d expected.</span></p>
</div>
</blockquote>
<p>The consensus that emerged over the first several decades of
encoding Unicode treats these forms somewhat ambiguously.</p>
<p>In mathematical notation, any character can be a super or
subscript, and so you find multiple scripts and symbols, but with
not limit, in principle as to what additional characters some
specialty may adopt and super/subscript for some purpose. And you
have things like subscripts on subscripts and similarly complex
layouts. In that context it is definitely appropriate to treat
subscripting as a generic operation and to not try to encode some
subset of possible results of that operation. You could never
encode all forms that are ever used (or available for use) in
mathematical notation, so for that purpose, encoding any further
explicit subscript forms doesn't help.</p>
<p>There is generic use of (mostly) superscript numbers in text, for
things like footnotes. These are also best done as generic
operations (via styles), particularly as they relate to document
structure that already suggests the use of plain text.<br>
</p>
<p>There are other notations, mainly phonetic, that have
super/subscript forms but do not<i> </i>need recursive
subscripting or all the other interesting features of mathematical
layout and formatting. In many of them, the super or subscript
form often acts pretty much like any other letter in the notation,
except for its shape. Common to these notations is that there's a
fixed set of such shapes; they don't even cover a full basic
alphabet; (that Unicode is getting close to having a full alphabet
is from overlapping use).</p>
<p>For these cases there's a benefit in being able to have a robust
plain text representation, so that "words" aren't required to use
styling to be understood. That's the driving case behind encoding
these forms. Ultimately the realization was that a universal
character encoding could not be "one-size-fits-all" when it comes
to serve wildly diverging styles of usage.</p>
<p>Another example of this dichotomy again involves the distinction
between mathematics and text. In text, the plain text does not
carry font information and it is fully acceptable to render the
result in any font that supports the letters in question. That
even goes for styles that aren't fully readable to everyday users.
For example, text in the Latin script can be rendered using a
Fraktur font that many people may have difficulties deciphering or
reading fluently. No matter, you haven't changed the meaning of
the text by doing that. And the selection of possible fonts is
near infinite. Some font variations are generic enough that they
can be applied to many scripts, others may be limited in practice
to some specific alphabet.<br>
</p>
<p>In math notation, you have the situation that mathematicians have
used the contrast between different font shapes to carry meaning.
In some conventions, Fraktur shapes are used to indicate that a
variable is a vector and not a scalar, for example. There are a
handful of font styles that are used in this way, a fairly fixed
set, and usually covering a limited set of characters as well.
Because the operation is not fully generic, it is possible to
cover it with explicitly encoded characters. At that point,
there's the benefit of preserving that distinction in plain text.<br>
<br>
In fact, it's possible this way, to render a very large subset of
mathematical notation in an (almost) plain text form. Incidentally
something not that dissimilar from the concept of markdown, a
plain text stream with a few chosen conventions, in the math case,
about the use of parens, plus dedicating some character to
function as subscript and superscript "operator". (All the other
math operators, such as integrals or radical signs, trigger their
own formatting, thus obviating the need for encoding that
explicitly).</p>
<p>Having the character for all shape variants used for variables
encoded directly makes this near plaintext form very powerful.
Again, what is a useful generic situation for ordinary text isn't
as workable for a notational system and vice versa. They emerging
insight was that Unicode should strive to make reasonable
accommodations, but in a way that focused on the central needs for
and features of each of them.<br>
<br>
If you look just at the encoding though, you come away with a
sense of apparent duplication and also seeming incompleteness: the
additions for phonetic notations will never cover the generic use
of math, while the few styled alphabets for math do nothing for
general text use. The key is to recognize which notation or use
case is supported by what, and then things make a whole lot more
sense.</p>
<p>A./<br>
</p>
</body>
</html>