David Starner via Unicode
unicode at unicode.org
Tue Jan 22 20:24:59 CST 2019
On Tue, Jan 22, 2019 at 4:18 PM Richard Wordingham via Unicode
<unicode at unicode.org> wrote:
> On Mon, 21 Jan 2019 00:29:42 -0800
> David Starner via Unicode <unicode at unicode.org> wrote:
> > The superscripts show a problem with multiple encoding; even if you
> > think they should be Unicode superscripts, and they look like Unicode
> > superscripts, they might be HTML superscripts. Same thing would happen
> > with italics if they were encoded in Unicode.
> But if one strips the mark-up out, and searching is then based on
> the collation elements of the text, then this is not a problem.
> Mathematical and ASCII capitals differ only at the identity level.
Searching is not the only problem. Copying the data will reveal the
Not only that, there was a previous argument that searching with
Unicode italics would let you find titles of books and such separately
from other usage of the phrase. That's not going to work if they're
based on the collation elements and ignore the italics. Which also
brings up the question of, if this is so important, why can't we
search for italicized data in web pages right now? For anyone
interacting with a web-browser that folds searching, this will change
nothing, until if and when italics-sensitive searching is made
available by the web-browser, which is not depending on Unicode
There are programs that extract titles from text files; I suspect the
programmers are most happy working with text formats that mark up
titles as titles, not italics. In systems that just mark up italics,
translating whatever form of italics marking is used is much easier
than separating italicized titles from other forms of italics.
Kie ekzistas vivo, ekzistas espero.
More information about the Unicode