Why incomplete subscript/superscript alphabet ?

Philippe Verdy verdy_p at wanadoo.fr
Fri Sep 30 10:19:34 CDT 2016


Your problem here is that "start" and "end" are not symbols/variables but
actual English words. Why would this usage be restricted only to English ?
The same formula would need to be really translated in various languages
and scripts, needing then mapping all letters in Latin, Greek, Cyrillic,
but even also Arabic, Japanese Chinese, Hindi...

This usage in plain text as comments in source codes generally do not need
to be really very friendly in their layout, they can remain more symbolic
and you should not even need to split these formulas in multiple lines,
using broken characters (such as parentheses and square braces, whose
presence in Unicode is justified only for mapping legacy characters used to
render actual text on old monospace-only terminals.

Here your source code is intended for programmers and should better use a
technical notation.

If you want to include a conventional formula, include an URL going to an
image or to an anchor in some document (HTML, PDF, Doc(x) file, or a
reference to a page in a book)
So I suggest you use some notational conventions such as TeX here if you
want to be exact (this notation may be different from the actual
implemetnation in the documented code).

The superscript/subscripts in Unicode have been encoded mostly because they
are needed for the orthography of some languages as distinct letters, but
most often as modifiers, they are not intended to be used to compose
separate words like "start" or "end" here.

Note also that many tools generating documentation from source code allow
you to insert HTML comments, so you could as well use <sub></sub>, and then
we don't need these additions (this would be an open door to reending
almost all letters in all scripts as subscripts/superscripts, with many new
problems for their diacritics).

Just consider how you would translate your formula in French: "start" would
become "début" (note the combining accute accent...). Here again with a TeX
notation or an HTML notation you solve the problem using <sub>début</sub>
in the formula. or using a <math>...</math> HTML element to embed a
complete MathML (TeX-like) formula.

Your souce code documentation is not necessarily in English. English is
used frequently in corporate code or in many open-sourced projects, but not
always. There's even open-sourced code that is managed by teams speaking
another language, for projects targetting mostly another language or an
organization that wants or requires documentation in another language
(notably for the public APIs; internal/private APIs are often excluded from
doc generation tools, so programmers are free to use any language that are
convenient to them, but they won't pass a lot of time tuning these comments
so that they are perfectly readable with all exact linguistic and
scriptural features and good looking for many readers). Discussing these
projects in English would exclude valuable contributions for the target
users of the application, possibly using incorrect terms or very fuzzy
translations to English when there are other requirements (notably with
terms with legal meaning).

Ok, the terms "end" and "start" are understood by all programmers, but not
necessarily all users of a public API (which may use it through other code
generation helpers, templates, HTML/application input forms and so on).


2016-09-30 11:57 GMT+02:00 Gael Lorieul <glorieul at coanda-deviation.info>:

> Hello all,
>
> I wonder why only a subset of the alphabet is available as subscript
> and/or superscript ?
>
> This is well illustrated on the table in the following Wikipedia page:
>
> https://en.wikipedia.org/wiki/Unicode_subscripts_and_
> superscripts#Latin_and_Greek_tables
>
> Is there a reason for this ?
>
> I would love to have these characters available because I often use
> Unicode to write equations as comments of a source code. For instance:
>
>      class Term_diff_rotDivStressTensor_splitted
>      /**
>       * Computes:
>       *
>       *     μ       ⎛μ⎞        ⎡1              ⎤
>       *     —.Δω + ∇⎜—⎟×Δu + ∇×⎢—.(∇u + ∇uᵀ)·∇μ⎥
>       *     ρ       ⎝ρ⎠        ⎣ρ              ⎦
>      */
>      {
>          [...] (class definition)
>      }
>
>
> or a more problematic example:
>
>      /*
>       *                    ⌠tᵉⁿᵈ
>       *     q(tᴺ) ← q(t⁰) +⎮   rhs(q,t) dt  +   (tᵉⁿᵈ - tˢᵗᵃʳᵗ)
>       *                    ⌡tˢᵗᵃʳᵗ
>      */
>
> Here "end" and "start" would have been better as subscripts, but I could
> not do so because letter "d" is not available as a subscript…
>
> As you can see, having only some letters available as subscript (&
> superscript) is sometimes a pain…
>
>
> Gaël Lorieul
>
> PhD student in Computational Fluid Dynamics
> at Université catholique de Louvain
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20160930/6ee5aeab/attachment.html>


More information about the Unicode mailing list