Aw: Re: HTML entities

Martin J. Dürst duerst at it.aoyama.ac.jp
Mon Mar 22 18:44:11 CDT 2021


Hello Marius, others,

On 2021/03/22 22:23, Marius Spix via Unicode wrote:
> You cannot just map <sup>2</sup> to SUPERSCRIPT TWO, because you may have cases
> with nested <sup> or <sub> like 10<sup>(10<sup>100</sup>)</sup>, which is the
> representation of a number known as Googolplex, or ρ<sub>CO<sub>2</sub><sub>,
> which is the percentage of carbon dioxide in an air sample. Such cases are not
> and should not be handled by Unicode, because their interpretation requires a
> stack machine.
> CSS is also no solution, because <sub> and <sub> are semantic tags (like <del>,
> <strong>, <em> and <kbd>) and not just stylistic ones (like <s>, <b>, <i> or <tt>).

What I meant was not to use CSS instead of <sup> or <sub>, but to use it 
in addition to one of these. That should make it possible to address the 
browser's limitation on rendering superscripts and subscripts. Using CSS 
(and Web Fonts) it should be possible to get as close as needed in look 
and style to the builtin ¹²³ superscript characters without actually 
using these characters. That would also make sure that none of these 
characters needs character entity references, and there is no worry 
about using a character that does not have a superscript (or subscript) 
variant in Unicode itself.

That would avoid the slippery slope problem both for character entity 
references and for Unicode superscript/subscript variants. And that's a 
very good thing, because whenever somebody comes up with a request for 
yet another of these, the only thing that is sure is that it won't be 
the last.

See an additional comment below.


> *Gesendet:* Montag, 22. März 2021 um 09:53 Uhr
> *Von:* "Jukka K. Korpela via Unicode" <unicode at unicode.org>
> *An:* "Martin J. Dürst" <duerst at it.aoyama.ac.jp>
> *Cc:* "via Unicode" <unicode at unicode.org>
> *Betreff:* Re: HTML entities
> Martin J. Dürst (duerst at it.aoyama.ac.jp <mailto:duerst at it.aoyama.ac.jp>) wrote:
> 
>      Hello Jukka, others,
> 
>      On 2021/03/18 17:20, Jukka K. Korpela via Unicode wrote:
>       > Tex (textexin at xencraft.com <mailto:textexin at xencraft.com>) wrote:
> 
>       >> However, you are quoting a doc that has been withdrawn.
> 
>       > It’s a pity that this well-written and useful document was withdrawn, for
>       > reasons I don’t understand.
> 
>      Here are the main reasons, as far as I understand them. Unicode gets
>      updated roughly once a year, and Web technology also changes over time.
>      There was not enough manpower to keep the document up to date.
> 
>      In addition, the document was always a kind of tug-of-war between those
>      who pushed for more favorable descriptions of specific Unicode
>      characters (such as ⁴ in this discussion) or more favorable descriptions
>      of markup-based and style-based solutions (such as <sup></sup>).
> 
> Thank you for the description. These opposite views surely reflected different
> needs, such as the need to represent data in plain text in some contexts and the
> need for more structured representation.

Not only. They also were a front line in the discussion about how far 
Unicode should go in encoding characters with typographical/stylistic 
distinctions, or in other words, what should be the limits of plain text.

Regards,   Martin.


>      Well, an then somebody else uses 10<sup>3.5</sup> somewhere. How are you
>      going to express this so that it doesn't turn into 103.5 in plain text?
>      The problem is that there is always a limit somewhere for plain text.
> 
> Well, in the given case, it might help if we had IMPLIED EXPONENTIATION (we
> don’t; we have IMPLIED TIMES, but it does not help here); at least it would
> appear in text data to indicate that adjacent digits are not part of the same
> number.
> 
> 
>      There is also always a limit somewhere for markup and styled rendering,
>      but it's in a quite different place.
> 
> Regarding exponents, the limit is currently set by the presence of superscript
> characters for digits, plus, and minus, and (for some reason), =, (, ), and n.
> This covers most of the cases where one might consider using superscripts in
> general texts and in expressing values of quantities.
> 
> But when you have, say, text that contains the simple expression /ax /with /x/
> as a superscript denoting exponent there is no satisfactory way to represent it
> in plain text. Using just ax would mean using a wrong expression, and using aˣ
> (with U+02E3 MODIFIER LETTER SMALL X) would be too tricky. Unicode hasn’t got a
> repertoire of superscript Latin letters even though they are often used as
> semantically different from normal letters; it only has some of such letters,
> apparently meant for special uses only (like phonetic symbols).
> 
> 
>      Out of the box rendering of <sup> and <sub> may be rather crude, but I
>      guess it should be possible to do a lot better with some dose of CSS and
>      possibly some Web fonts.
> 
> In a sense, it would be straightforward to map, say, <sup>2</sup> to SUPERSCRIPT
> TWO in the rendering phase, either directly at the character level or via glyph
> selection when an OpenType font is used. In another sense, it would be
> complicated, since we hardly want to have <sup>2</sup> rendered substantially
> different from <sup>x</sup> in style. So the mapping should take place only when
> the entire document contains only such <sup> elements where are characters have
> superscript counterparts in Unicode (or at the glyph level).
> 
> Jukka



More information about the Unicode mailing list