My suggestions for Unicode based math expression format(s)

Kent Karlsson kent.b.karlsson at bahnhof.se
Tue Dec 13 05:36:15 CST 2022


(Hoping that this goes through ok; I did have some problems with the sum sign when copying this text…)

I've deviced a (or rather, several) new format(s) for representing math expressions.
Why, you may wonder... Isn't MathML the answer to everyting math? Well, not quite.
 
After more than 20 years since the first version of MathML, it is still not a great
success. I think there are several reasons for that. One is the obvious: it is too
verbose. Another is that (much due to the verbosity) that one really need authoring
tools to be able to write any math expression in the MathML representation. The
advantage of TeX math (or even old eqn) expressions is that users can with relative
ease type the expression they want on the keyboard. Ordinary cut-paste-modifyViaKeyboard
works. Authoring tools are less straight-forward to use. Furhter, not everything is
HTML (or XML). One may even want to have math expressions in what is otherwise plain
text; for instance for cut and paste, loosing styling per se (colour, bold/..., size)
but not the math expressions.
 
But what about typability, directly from the keyboard, without using a special authoring
tool? Are eqn or TeX the only options? Well, there is AsciiMath and UnicodeMath...
However, those do common parenthesis parsing that is undesirable, among other things.
And, apart from UnicodeMath, they were created long before Unicode, so they are not well
adapted to using Unicode characters.
 
OMML (Office Math ML, also XML based) is just as verbose as MathML, if not worse.
 
Using {} (a convention borrored from TeX; and using \{ and \} for literal {}) and some
other special "mark-down" and character escape inspired notations, we can make a surface
form of a math expression representation (encoding if you like) that is typable on a Latin based
keyboard; except that ∑ and π in the example here may need some further escape notation,
like \sum, \pi, to be fully keyboard typable (similarly to TeX, eqn, UnicodeMath, etc.). Not-so-common
symbols will still need to be picked from some kind of menu, or use Unicode charater escapes,
\uxxxx, \Uxxxxxx. Here is an example, using the same expression as is used as the lead example
in the MathML Core specification; it looks a little bit like TeX, intentionally, due to the selection
of {}^_ as meta-characters for certain math expression controls, but isn't TeX:
 
${∑$/{n=1}$\{+∞}{1\/n^2}={π^2\/6}}
 
There is also a HTML/XML compatible form proposed, that is fully equivalent in expressivity
with the other forms/variants proposed. Though it is not MathML, but it is using XML tags,
so it is a bit longer than the above (read "me" as "math expresion"):
 
<me>∑<blw/><me>n=1</me><abv/><me>+∞</me><me>1<dv/>n<rsp/>2</me>=<me>π<rsp/>2<dv/>6</me></me>
 
Or with some more whitespace/linebreaks:
<me>
                       ∑<blw/><me>n=1</me><abv/><me>+∞</me>   <me>1<dv/>n<rsp/>2</me>
                       =
                       <me>π<rsp/>2<dv/>6</me>
</me>
 
This shows that having math expressions in an XML compatible format does not need to have
clay feet. There are several key reasons for this relative light-footedness. The reasons
include using: default styles, short tag/attribute names (for the XML variant) and short
controls/markup for the other variants, and the use of a level of structural parsing,
uncommon for XML (but otherwise common, also for math, in e.g. eqn and TeX). Details in
the spec referenced below.
 
It also shows that equivalent representations can be even more light-footed than the XML/HTML
variant, as well as the possiblilty of having variant surface representation that fits with
at least some other contexts (than XML/HTML).
 
In addition, the respresentations (all variants) can still be general enough to allowRTL math
expressions in a reliable way (in particular, reliable direction of arrows, which in math expressions
almost always refer to the left and right side "arguments”, not an external physical direction),
as well as chemical reaction formulas (math-like, not graphical) and the like. Re. arrows: see
https://www.unicode.org/L2/L2022/22026r-non-bidi-mirroring.pdf.
 
You can find the proposed format(s) specification at
https://github.com/kent-karlsson/control/blob/main/math-layout-controls-2022-C.pdf.
 
There is absolutely no claim that this covers everyting w.r.t. math expressions;
very likely it does not. But it does cover more than I set out to cover. There is no attempt
to be compatible with MathML (sorry, but that would have killed the idea).
 
Comments are welcome.
 
Happy Lucia!*
/Kent Karlsson
 
* https://en.wikipedia.org/w/index.php?title=Saint_lucia%27s_day


More information about the Unicode mailing list