Unicode encoding philosophy

Kent Karlsson kent.b.karlsson at bahnhof.se
Wed Oct 11 16:48:16 CDT 2023



> 11 okt. 2023 kl. 10:02 skrev Giacomo Catenazzi via Unicode <unicode at corp.unicode.org>:
> 
>  We may find that ASCII provide different level of separations (FS, GS, RS, US,

As far as I know, NOBODY is using these anymore. But I may be wrong; really
old applications do not count, nor do EBCDIC ones (which also fall in the
"really old" category).
 
Note however that Unicode does not really have these; Unicode (referencing
ECMA-48; in the ISO/IEC version) has IS1, IS2, IS3, IS4, and they have no
pre-defined hierarchy. A hierarchy, if any, there need not be anyone, is defined
by the application. So Unicode was wrong in equating them as aliases.
 
> but also EM, FF, CR/LF,

Nobody (I hope) is using EM. But FF, CR, LF are of course commonly used.
LS and PS (Unicode replacements) never gained popularity, for compatibility
reasons; they will likely never gain popularity.


> and also SPACE), or with ECMA, more about style (but as I found in Wikipedia, each terminal has own interpretation of ”red" and "highlight red",

Yes, that is annoying... See 
https://github.com/kent-karlsson/control/blob/main/ecma-48-style-modernisation-2023B.pdf <https://github.com/kent-karlsson/control/blob/main/ecma-48-style-modernisation-2023B.pdf>,
esp. page 31. But 35 or so years ago there were technical limitations, but we
do not have those today, for many displays.


> But also this last fact may give us some hints: why we do not use ECMA anymore for such formatting?

By ECMA, I assume you mean ECMA-48 (as I assume above).
 
ECMA-48 *is* very commonly used. Unfortunately only in terminal emulators.
There is no need for that limitation. The formatting part (modernised)
may well be used in text files as well. See
https://github.com/kent-karlsson/control/blob/main/ecma-48-style-modernisation-2023B.pdf <https://github.com/kent-karlsson/control/blob/main/ecma-48-style-modernisation-2023B.pdf>.
(ECMA-48 has other stuff as well, in particular for keyboard input, as well as
"terminal screen editing" (and those are used for terminal emulators). These
are of course not suitable for text files with formatting. But ECMA-48 is a mix of stuff.)

> For sure Microsoft knew it very well e.g. for Microsoft Word

1) ECMA-48 (even with the modernisation proposed in
https://github.com/kent-karlsson/control/blob/main/ecma-48-style-modernisation-2023B.pdf <https://github.com/kent-karlsson/control/blob/main/ecma-48-style-modernisation-2023B.pdf>)
is *far* from sufficient for such things as (full-fledged) document formatting, spreadsheets,
and so on. But one does not always need full-fledged document formatting (such as HTML/CSS,
Word, etc.). A much more light-weight formatting system is often useful.
 
2) ECMA-48 has not been updated for over 30 years. I think that is a pity.
It is not at all a bad standard. (It even has support for Ruby; that you mentioned.)
But with an update I think it may well be used for "light-weight
formatted" text. I think there is a gap to fill between "plain text" editors
and (full-fledged) document format apps (including HTML/CSS which has lot and
lots of capabilities, and hard to implement in full), and plain text apps which
do not even allow italics or bold, or the slightest font size change (for a
heading for instance), and that gap may well be filled by using ECMA-48 in
modernised form. (RTF is not all that attractive….) And ECMA-48 (or rather it’s
ISO equivalent) is referenced by Unicode&ISO/IEC 10646.

/Kent K

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://corp.unicode.org/pipermail/unicode/attachments/20231011/73258f9b/attachment-0001.htm>


More information about the Unicode mailing list