ECMA-48 (a.k.a. ISO/IEC 6429) "update" proposal update, plus "math anywhere" proposal update

Sun Jul 23 07:03:04 CDT 2023

Hi!

Just a note about that I have updated two documents related to Unicode. I have posted about (earlier
versions of) them before.

—————————————————————————————

I have done some updates to my "proposed update" (I have actually no hope that the standard
itself will be updated) to ECMA-48 (ISO/IEC 6429) regarding text styling. You find them in
https://github.com/kent-karlsson/control/blob/main/ecma-48-style-modernisation-2023B.pdf <https://github.com/kent-karlsson/control/blob/main/ecma-48-style-modernisation-2023B.pdf>.

As I have mentioned before, ECMA-48 provides some styling mechanisms (more than you might
think) via so-called control sequences. They are designed to be what Unicode calls "default
ignorable", though they are character sequences, not singular characters, but they do start
with a "control" character.

Even though these styling (and a few structuring) mechansims are more powerful than first
meets the eye, it is noticable thet the standard (latest version) is more than 30 years old.
But it is still used, and indeed still useful. But in the "proposed update" I pick up some
additions done elsewhere, as well as add other styling "kinds" that are often found in modern
stylable text contexts, plus nailing down the semantics more tightly, like "exactly" (modulo
colour correction, which is out of scope) which colour is "blue", "green", etc. similar
to what is done in CSS.

The new version has been adapted to cater for some comments I got on an earlier version.
I hereby thank for those comments, without mentioning any particular names. For instance,
I have updated the text on bidi handling and how the Unicode bidi algorithm is tied in
(or not tied in in some cases). But there are many other updates since the last version,
most of them based on comments I got.

Some functionality:
                          - Bold, italic, underline, strike-through, font selection (from a small palette).
                          - Font size setting.
                          - Paragraph indents setting.
                          - "Bullet" (and numbered) lists (though the "bullet"/number character(s) must
                            be given explicitly as they are to be displayed, it is not automatic).
                          - Tab position setting (not new, of course).
                          - Tables, or actually table rows (though cell framing and background colour
                            must be given explicitly for each cell) (not new, though an improved PTX).
                          - Colours for text, background, and more.
                          - Text direction control (including (new) bidi setting; but otherwise not new).
                          - Page or table cell rotation (and that is not at all new).

It is not a full-fledged document formatting specification. Way too much functionality
is missing for that. But it does provide for a "middle ground" between "pure plain text"
and "full-fledged document formatting". There is a way to big gap between "pure plain
text" and a (high-end) document formatting system. As I may have mentioned earlier,
there is no need to try to invent something from scratch for putting something useful
in that gap. Instead ECMA-48 is a good and viable basis. Implementers can use the
specified functionalities as a smörgåsbord, from which to pick functionality that
the implementors, together with their clients, choose to support (as ECMA-48 always
has been, and like Unicode is today as well).

All of this of course bypasses the question whether ECMA-48 control sequences were
ever intended to be used for styled text document storage. Perhaps they were not,
but that is moot, especially now when we have "graphical" text editors that manage
the file storage representation in the background.

I think Unicode/10646 have treated ECMA-48/ISO6429 waaaay too step-motherly, a
treatment it does not deserve.

(I think also the keyboard input part of ECMA-48 also needs a bit of updating, but
that will be the subject of another, separate, proposal. But I only have a draft so far,
si I’m to giving a link here.)

————————————————————

I have also updated the math expression representations proposals. You find them in
https://github.com/kent-karlsson/control/blob/main/math-layout-controls-2023-B.pdf <https://github.com/kent-karlsson/control/blob/main/math-layout-controls-2023-B.pdf>.
The updates from the last version here are smaller. Mainly that I have integrated the
mirroring pair data for arrows (and more) in an appendix (since I don't yet see any
progress  in including that data file in the Unicode database, which I have proposed).

Note that the math expression representation proposals are separate from the ECMA-48
update proposal, even though one of the three variants is compatible with ECMA-48.

The three different representations (1. compatible with control codes, using SCI
control sequences; 2. compatible with HTML/SVG, and maybe other XML based schemes;
3. markdown style) are fully interequivalent. (No claim of equivalence to other
math expression representations.)

They also:
                          - are the only math expression represntations I know of that handle combining
                            characters correctly,
                          - are almost the only math expression represntations I know of that I am sure,
                            handle multiletter variables correctly (TeX has \mathit{...} for handling
                            multiletter variables),
                          - are the only math expression representations I know of that handle bidi for
                            math expressions correctly and reliably, including handling arrows in math
                            expression and (explicit) mirroring of (potential math) symbols that have
                            no mirror character allocated,
                          - avoid the verbosity of MathML and OMML, but still has an HTML/SVG 
                            compatible variant representation type,
                          - forbid the use of "MATHEMATICAL" characters, which were a bad idea from 
                            the very beginning; these formats have a more general and more flexible 
                            mechanism for math style for letters/digits, and that mechanism is 
                            incompatible with"MATHEMATICAL" characters,
                          - are simple and straightforward representations.

As for which letters/digits/symbols to support (and for which math styles), that is up to
implementers and their clients. But one would expect at least A-Z, a-z to be supported for
most styles, 0-9 and common Greek letters (and two Hebrew letters) for some styles, plus
common math symbols (selected subset of Sm union So).

These math expression representations fits for anything from pure plain text to
ECMA-48 styled text to HTML and various document formats, including having math
expressions in graphs/diagrams (HP-GL, SVG, ...).

I don't have a catchy name for these math expression representations. I realise that
that is a flaw in marketing... "KISS-math"? (https://en.wikipedia.org/wiki/KISS_principle <https://en.wikipedia.org/wiki/KISS_principle>),
with my preferred reading: "keep it small and simple" (which is the design principle,
despite the more than 70 pages). The design is simple, and the math expression 
representations are small (compared to several other proposals). I would also say it
is ”straightforward" (but "KISSS", naa).

——————————————————————

/Kent K

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://corp.unicode.org/pipermail/unicode/attachments/20230723/bc8d4b1c/attachment.htm>