Unicode encoding philosophy
Giacomo Catenazzi
cate at cateee.net
Thu Oct 5 07:22:56 CDT 2023
On 4 Oct 2023 19:54, William_J_G Overington via Unicode wrote:
(...)
> What puzzles me though, is that structurally the proposal seems to have
> much the same encoding philosophy as a suggestion proposed by me in that
> they both would allow a variation selector to be used so as to conserve
> in plain text information that is typically these days conserved in rich
> text and gets lost if plain text is used. In my proposal, using a
> variation selector to conserve in a plain text document information
> about the use of italics in some text.
>
> My proposal was rejected, quite strongly.
>
> So, deep down, what please is the Unicode encoding philosophy that
> allows variation selectors to be used to conserve some information, yet
> not other information, in plain text?
Note: Unicode philosophy changes (you still see some obsolete formatting
tags), also because real life problems changed the target (from ideal to
what can be implemented and used by real people).
In any case, Unicode is not something magic. You tell them, and you get
something. In fact support is not ideal on rendering on many "minor"
scripts. So: how do you will implement your proposal in many operating
systems and programs?
Unicode supports some stylistic formatting, but mostly at character
level, so that fonts, shaping, rendering. At character level (so variant
selector) is easy to implement in fonts: and it is required also for
other non-Unicode uses (e.g. tabular numbers).
Your proposal is instead disruptive on most layer on rendering texts.
And it will takes the limited resources to support more languages to
make complex most of languages, and mostly just for few Western
languages. And we can already format text with Italic, e.g. with TeX,
LaTeX, HTML, etc. And also direct support in Unicode: we already have C0
(control block 0), and standards to do italic directly.
Note: formatting is important, but it should be done at different level
(we should not repeat errors of 1960s-1989s on mixing text and
formatting, and putting formatting in "binary"/codepoints: we need
verbose and human readable syntax). IMHO HTML is not good enough for all
formatting things, but I do not think it should be done at Unicode (or
at least, not at codepoints, but at UAX level or with more "independent"
like ICU.
Please: consider how to implement things. Help to program proof of
concepts. Features which will not be implemented would just get troubles
on Unicode (and so adding obsolete features). And Unicode success is
also not to have flag days. (Think about how would interact programs
which know italic and which do not know, and security implications.).
cate
More information about the Unicode
mailing list