Unicode encoding philosophy

Giacomo Catenazzi cate at cateee.net
Thu Oct 5 07:22:56 CDT 2023


On 4 Oct 2023 19:54, William_J_G Overington via Unicode wrote:
(...)
> What puzzles me though, is that structurally the proposal seems to have 
> much the same encoding philosophy as a suggestion proposed by me in that 
> they both would allow a variation selector to be used so as to conserve 
> in plain text information that is typically these days conserved in rich 
> text and gets lost if plain text is used. In my proposal, using a 
> variation selector to conserve in a plain text document information 
> about the use of italics in some text.
> 
> My proposal was rejected, quite strongly.
> 
> So, deep down, what please is the Unicode encoding philosophy that 
> allows variation selectors to be used to conserve some information, yet 
> not other information, in plain text?


Note: Unicode philosophy changes (you still see some obsolete formatting 
tags), also because real life problems changed the target (from ideal to 
what can be implemented and used by real people).


In any case, Unicode is not something magic. You tell them, and you get 
something. In fact support is not ideal on rendering on many "minor" 
scripts. So: how do you will implement your proposal in many operating 
systems and programs?

Unicode supports some stylistic formatting, but mostly at character 
level, so that fonts, shaping, rendering. At character level (so variant 
selector) is easy to implement in fonts: and it is required also for 
other non-Unicode uses (e.g. tabular numbers).

Your proposal is instead disruptive on most layer on rendering texts. 
And it will takes the limited resources to support more languages to 
make complex most of languages, and mostly just for few Western 
languages. And we can already format text with Italic, e.g. with TeX, 
LaTeX, HTML, etc. And also direct support in Unicode: we already have C0 
(control block 0), and standards to do italic directly.


Note: formatting is important, but it should be done at different level 
(we should not repeat errors of 1960s-1989s on mixing text and 
formatting, and putting formatting in "binary"/codepoints: we need 
verbose and human readable syntax). IMHO HTML is not good enough for all 
formatting things, but I do not think it should be done at Unicode (or 
at least, not at codepoints, but at UAX level or with more "independent" 
like ICU.


Please: consider how to implement things. Help to program proof of 
concepts. Features which will not be implemented would just get troubles 
on Unicode (and so adding obsolete features). And Unicode success is 
also not to have flag days. (Think about how would interact programs 
which know italic and which do not know, and security implications.).

cate




More information about the Unicode mailing list