A last missing link for interoperable representation

Sat Jan 12 04:57:26 CST 2019

On 2019-01-11, James Kass via Unicode <unicode at unicode.org> wrote:
> Exactly.  William Overington has already posted a proof-of-concept here:
> https://forum.high-logic.com/viewtopic.php?f=10&t=7831
> ... using a P.U.A. character /in lieu/ of a combining formatting or VS 
> character.  The concept is straightforward and works properly with 
> existing technology.

It does not work with much existing technology. Interspersing extra
codepoints into what is otherwise plain text breaks all the existing
software that has not been, and never will be updated to deal with
arbitrarily complex algorithms required to do Unicode searching.
Somebody who need to search exotic East Asian text will know that they
need software that understands VS, but a plain ordinary language user
is unlikely to have any idea that VS exist, or that their searches
will mysteriously fail if they use this snazzy new pseudo-plain-text
italicization technique

It's also fundamentally misguided. When I _italicize_ a word, I am
writing a word composed of (plain old) letters, and then styling the
word; I am not composing a new and different word ("_italicize_") that
is distinct from the old word ("italicize") by virtue of being made up
of different letters.

I think the VS or combining format character approach *would* have
been a better way to deal with the mess of mathematical alphabets,
because for mathematicians, *b* is a distinct symbol from b, and while
there may be correlated use of alphabets, there need be no connection
whatever between something notated b and something notated *b*.

But for plain text, it's crazy.

-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.