<!DOCTYPE html>

<html>

  <head>

    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">

  </head>

  <body>

    <div class="moz-cite-prefix">On 12/14/2025 10:47 AM, Phil Smith III

      via Unicode wrote:<br>

    </div>

    <blockquote type="cite"

      cite="mid:012d01dc6d2a$105656a0$310303e0$@akphs.com">

      <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">

      <meta name="Generator"

        content="Microsoft Word 15 (filtered medium)">

      <style>@font-face

        {font-family:"Cambria Math";

        panose-1:2 4 5 3 5 4 6 3 2 4;}@font-face

        {font-family:Calibri;

        panose-1:2 15 5 2 2 2 4 3 2 4;}@font-face

        {font-family:Aptos;}p.MsoNormal, li.MsoNormal, div.MsoNormal

        {margin:0in;

        font-size:12.0pt;

        font-family:"Aptos",sans-serif;}a:link, span.MsoHyperlink

        {mso-style-priority:99;

        color:blue;

        text-decoration:underline;}span.EmailStyle18

        {mso-style-type:personal-reply;

        font-family:"Calibri",sans-serif;

        color:#0A2F41;}.MsoChpDefault

        {mso-style-type:export-only;}div.WordSection1

        {page:WordSection1;}</style>

      <div class="WordSection1">

        <p class="MsoNormal"><span

style="font-family:"Calibri",sans-serif;color:#0A2F41">Well,

            I’m sorta “asking for a friend” – a coworker who is deep in

            the weeds of working with something Unicode-related. I’m

            blaming him for having told me that :)<o:p></o:p></span></p>

        <p class="MsoNormal"><span

style="font-family:"Calibri",sans-serif;color:#0A2F41"><o:p> </o:p></span></p>

        <br>

      </div>

    </blockquote>

    <p>This actually deserves a deeper answer, or a more "bird's-eye"

      one, if you want. Read to the end.</p>

    <p>The way you asked the question seems to hint that in your minds

      you and your friend conflate the concept of "combining" mark and

      "diacritic". That would not be surprising if you are mainly

      familiar with European scripts and languages, because in that

      case, this equivalence kind of applies.</p>

    <p>And you may also be thinking mainly of languages and their

      orthographies, and not of notations, phonetic or otherwise, that

      give rise to unusual combinations. Most European languages do have

      a reasonably small, fixed set of letters with diacritics in their

      orthographies, even though there are many languages where, if you

      ask the native users to list all the combinations, they will fall

      short. Example is the use of an accent with the letter 'e' in some

      of the Scandinavian languages to distinguish two identically

      spelled small words that have very different functions in the

      syntax. You will see that accent used in books and formal writing,

      but I doubt people bother when writing a text message.</p>

    <p>The focus on code space is a red herring to a degree. The real

      difficulty would be in cataloging all of the rare combinations,

      and get all fonts to be aware of them. It is much easier to encode

      the diacritic as a combining character and have general rules for

      layout. With modern fonts, you can, in principle, get acceptable

      display even for unexpected combinations without the effort of

      first cataloging, then publishing and then having all font vendors

      explicitly adding an implementation for that combination before it

      can be used.</p>

    <p>Other languages and scripts have combinatorics as part of their

      DNA, so to speak. Their structural unit is not the letter (with or

      without decorations) but the syllable, which is naturally combined

      from components that graphically attach to each other or even fuse

      into a combined shape. Because that process is not random, it's

      easier to encode these structural elements (some of which are

      combining characters) than to try to enumerate the possible

      combinations. It doesn't hurt that the components nicely map onto

      discrete keys on the respective keyboards.</p>

    <p>Notations, such as scientific notation, also often assigns a

      discrete identity to the combining mark. A dot above can be the

      first derivative with respect to time, which can be applied to any

      letter designating a variable, which can be, at the minimum any

      letter from the Latin or Greek alphabets, but why stop there.

      There's nothing in the notation itself that would enjoin a

      scientist from combining that dot with any character they find

      suitable. The only sensible solution is encoding a combining mark,

      even though some letters exist that have a dot above as part of an

      orthography and are also encoded in precomposed form.</p>

    <p>In contrast, Chinese ideographs, while visually composed of

      identifiable elements, are treated by their users as units and

      well before Unicode came along there was an established approach

      how to manage things like keyboard entry while encoding these as

      precomposed entities and not as their building blocks.</p>

    <p>A big part of the encoding decision is always to do what makes

      sense for the writing system or notation (and the script it is

      based on).</p>

    <p>For a universal encoding, such as Unicode, there simply isn't a

      "one-size-fits-all" solution that would work. But if you look at

      this universal encoding only from a very narrow perspective of the

      orthographies that you are most familiar with, then,

      understandably, you might feel that anything that isn't directly

      required (from your point of view) is an unnecessary complication.</p>

    <p>However, once you adopt a more universal perspective, it's much

      easier to not rat-hole on some seeming inconsistencies, because

      you can always discover how certain decisions relate to the   

      specific requirements for one or more writing systems.

      Importantly, this often includes requirements based on de-facto

      implementations for these systems before the advent of Unicode.

      Being universal, Unicode needed to be designed to allow easy

      conversion from all existing data sets. And for European scripts,

      the business community and the librarians had competing systems,

      one with limited sets of precomposed characters and one with

      combining marks for diacritics. The ultimate source of the duality

      stems from there, but the two communities had different goals. One

      wanted to efficiently handle the common case (primarily mapping

      all the modern national typewriters into character encoding) while

      the other was interested in a full representation of anything that

      could be present in printed book titles (for cataloging),

      including unusual or historic combinations.</p>

    <p>In conclusion, the question isn't a bad one, but the real answer

      is that complexity is very much part of human writing, and when

      you design (and extend) a universal character encoding, you will

      need to be able to represent that full degree of complexity.

      Therefore, what seem like obvious simplifications really aren't

      feasible, unless you give up on attempting to be universal.</p>

    <p>A./</p>

  </body>

</html>