<!DOCTYPE html>

<html>

  <head>

    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">

  </head>

  <body>

    <div class="moz-cite-prefix">On 11/5/2024 12:31 PM, Phil Smith III

      via Unicode wrote:<br>

    </div>

    <blockquote type="cite"

      cite="mid:174201db2fc1$a7cc2890$f76479b0$@akphs.com">

      <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">

      <meta name="Generator"

        content="Microsoft Word 15 (filtered medium)">

      <style>@font-face

        {font-family:"Cambria Math";

        panose-1:2 4 5 3 5 4 6 3 2 4;}@font-face

        {font-family:Calibri;

        panose-1:2 15 5 2 2 2 4 3 2 4;}@font-face

        {font-family:Aptos;}p.MsoNormal, li.MsoNormal, div.MsoNormal

        {margin:0in;

        font-size:12.0pt;

        font-family:"Aptos",sans-serif;}a:link, span.MsoHyperlink

        {mso-style-priority:99;

        color:#467886;

        text-decoration:underline;}span.EmailStyle18

        {mso-style-type:personal-reply;

        font-family:"Calibri",sans-serif;

        color:#0A2F41;}.MsoChpDefault

        {mso-style-type:export-only;

        font-size:11.0pt;

        mso-ligatures:none;}div.WordSection1

        {page:WordSection1;}</style><!--[if gte mso 9]><xml>

<o:shapedefaults v:ext="edit" spidmax="1026" />

</xml><![endif]--><!--[if gte mso 9]><xml>

<o:shapelayout v:ext="edit">

<o:idmap v:ext="edit" data="1" />

</o:shapelayout></xml><![endif]-->

      <div class="WordSection1">

        <p class="MsoNormal"><span

style="font-family:"Calibri",sans-serif;color:#0A2F41">I

            assume you’ve seen <a

href="https://en.wikipedia.org/wiki/Unicode_subscripts_and_superscripts"

              moz-do-not-send="true" class="moz-txt-link-freetext">https://en.wikipedia.org/wiki/Unicode_subscripts_and_superscripts</a>,

            which discusses what is and isn’t available as

            super/subscripts (henceforth “ss”) in Unicode. That

            surprised me—I would have thought that ss were markup, not

            characters, so there’s more of it implemented already than

            I’d expected.</span></p>

      </div>

    </blockquote>

    <p>The consensus that emerged over the first several decades of

      encoding Unicode treats these forms somewhat ambiguously.</p>

    <p>In mathematical notation, any character can be a super or

      subscript, and so you find multiple scripts and symbols, but with

      not limit, in principle as to what additional characters some

      specialty may adopt and super/subscript for some purpose. And you

      have things like subscripts on subscripts and similarly complex

      layouts. In that context it is definitely appropriate to treat

      subscripting as a generic operation and to not try to encode some

      subset of possible results of that operation. You could never

      encode all forms that are ever used (or available for use) in

      mathematical notation, so for that purpose, encoding any further

      explicit subscript forms doesn't help.</p>

    <p>There is generic use of (mostly) superscript numbers in text, for

      things like footnotes. These are also best done as generic

      operations (via styles), particularly as they relate to document

      structure that already suggests the use of plain text.<br>

    </p>

    <p>There are other notations, mainly phonetic, that have

      super/subscript forms but do not<i> </i>need recursive

      subscripting or all the other interesting features of mathematical

      layout and formatting. In many of them, the super or subscript

      form often acts pretty much like any other letter in the notation,

      except for its shape. Common to these notations is that there's a

      fixed set of such shapes; they don't even cover a full basic

      alphabet; (that Unicode is getting close to having a full alphabet

      is from overlapping use).</p>

    <p>For these cases there's a benefit in being able to have a robust

      plain text representation, so that "words" aren't required to use

      styling to be understood. That's the driving case behind encoding

      these forms. Ultimately the realization was that a universal

      character encoding could not be "one-size-fits-all" when it comes

      to serve wildly diverging styles of usage.</p>

    <p>Another example of this dichotomy again involves the distinction

      between mathematics and text. In text, the plain text does not

      carry font information and it is fully acceptable to render the

      result in any font that supports the letters in question. That

      even goes for styles that aren't fully readable to everyday users.

      For example, text in the Latin script can be rendered using a

      Fraktur font that many people may have difficulties deciphering or

      reading fluently. No matter, you haven't changed the meaning of

      the text by doing that. And the selection of possible fonts is

      near infinite. Some font variations are generic enough that they

      can be applied to many scripts, others may be limited in practice

      to some specific alphabet.<br>

    </p>

    <p>In math notation, you have the situation that mathematicians have

      used the contrast between different font shapes to carry meaning.

      In some conventions, Fraktur shapes are used to indicate that a

      variable is a vector and not a scalar, for example. There are a

      handful of font styles that are used in this way, a fairly fixed

      set, and usually covering a limited set of characters as well.

      Because the operation is not fully generic, it is possible to

      cover it with explicitly encoded characters. At that point,

      there's the benefit of preserving that distinction in plain text.<br>

      <br>

      In fact, it's possible this way, to render a very large subset of

      mathematical notation in an (almost) plain text form. Incidentally

      something not that dissimilar from the concept of markdown, a

      plain text stream with a few chosen conventions, in the math case,

      about the use of parens, plus dedicating some character to

      function as subscript and superscript "operator". (All the other

      math operators, such as integrals or radical signs, trigger their

      own formatting, thus obviating the need for encoding that

      explicitly).</p>

    <p>Having the character for all shape variants used for variables

      encoded directly makes this near plaintext form very powerful.

      Again, what is a useful generic situation for ordinary text isn't

      as workable for a notational system and vice versa. They emerging

      insight was that Unicode should strive to make reasonable

      accommodations, but in a way that focused on the central needs for

      and features of each of them.<br>

      <br>

      If you look just at the encoding though, you come away with a

      sense of apparent duplication and also seeming incompleteness: the

      additions for phonetic notations will never cover the generic use

      of math, while the few styled alphabets for math do nothing for

      general text use. The key is to recognize which notation or use

      case is supported by what, and then things make a whole lot more

      sense.</p>

    <p>A./<br>

    </p>

  </body>

</html>