HTML entities

Jukka K. Korpela jukkakk at gmail.com
Mon Mar 22 03:53:31 CDT 2021


Martin J. Dürst (duerst at it.aoyama.ac.jp) wrote:

> Hello Jukka, others,
>
> On 2021/03/18 17:20, Jukka K. Korpela via Unicode wrote:
> > Tex (textexin at xencraft.com) wrote:
>
> >> However, you are quoting a doc that has been withdrawn.
>
> > It’s a pity that this well-written and useful document was withdrawn, for
> > reasons I don’t understand.
>
> Here are the main reasons, as far as I understand them. Unicode gets
> updated roughly once a year, and Web technology also changes over time.
> There was not enough manpower to keep the document up to date.
>
> In addition, the document was always a kind of tug-of-war between those
> who pushed for more favorable descriptions of specific Unicode
> characters (such as ⁴ in this discussion) or more favorable descriptions
> of markup-based and style-based solutions (such as <sup></sup>).


Thank you for the description. These opposite views surely reflected
different needs, such as the need to represent data in plain text in some
contexts and the need for more structured representation.

Well, an then somebody else uses 10<sup>3.5</sup> somewhere. How are you
> going to express this so that it doesn't turn into 103.5 in plain text?
> The problem is that there is always a limit somewhere for plain text.


Well, in the given case, it might help if we had IMPLIED EXPONENTIATION (we
don’t; we have IMPLIED TIMES, but it does not help here); at least it would
appear in text data to indicate that adjacent digits are not part of the
same number.

>
> There is also always a limit somewhere for markup and styled rendering,
> but it's in a quite different place.
>

Regarding exponents, the limit is currently set by the presence of
superscript characters for digits, plus, and minus, and (for some reason),
=, (, ), and n. This covers most of the cases where one might consider
using superscripts in general texts and in expressing values of quantities.

But when you have, say, text that contains the simple expression *ax *with
*x* as a superscript denoting exponent there is no satisfactory way to
represent it in plain text. Using just ax would mean using a wrong
expression, and using aˣ (with U+02E3 MODIFIER LETTER SMALL X) would be too
tricky. Unicode hasn’t got a repertoire of superscript Latin letters even
though they are often used as semantically different from normal letters;
it only has some of such letters, apparently meant for special uses only
(like phonetic symbols).

>
> Out of the box rendering of <sup> and <sub> may be rather crude, but I
> guess it should be possible to do a lot better with some dose of CSS and
> possibly some Web fonts.
>

In a sense, it would be straightforward to map, say, <sup>2</sup> to
SUPERSCRIPT TWO in the rendering phase, either directly at the character
level or via glyph selection when an OpenType font is used. In another
sense, it would be complicated, since we hardly want to have <sup>2</sup>
rendered substantially different from <sup>x</sup> in style. So the mapping
should take place only when the entire document contains only such <sup>
elements where are characters have superscript counterparts in Unicode (or
at the glyph level).

Jukka
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://corp.unicode.org/pipermail/unicode/attachments/20210322/4a660954/attachment.htm>


More information about the Unicode mailing list