A last missing link for interoperable representation

James Kass via Unicode unicode at unicode.org
Sat Jan 12 06:29:39 CST 2019


Julian Bradford wrote,

"It does not work with much existing technology. Interspersing extra
codepoints into what is otherwise plain text breaks all the existing
software that has not been, and never will be updated to deal with
arbitrarily complex algorithms required to do Unicode searching.
Somebody who need to search exotic East Asian text will know that they
need software that understands VS, but a plain ordinary language user
is unlikely to have any idea that VS exist, or that their searches
will mysteriously fail if they use this snazzy new pseudo-plain-text
italicization technique"

Sounds like you didn't try it.  VS characters are default ignorable.

First one is straight, the second one has VS2 characters interspersed 
and after the "t":
apricot
a︁p︁r︁i︁c︁o︁t︁
Notepad finds them both if you type the word "apricot" into the search box.

"..."

Regardless of how you input italics in rich-text, you are putting italic 
forms into the display.

"I think the VS or combining format character approach *would* have
been a better way to deal with the mess of mathematical alphabets, ..."

I think so, too, but since I'm not a member of *that* user community, my 
opinion hasn't much value.  Plus VS characters were encoded after the 
math stuff.

"But for plain text, it's crazy."

Are you a member of the plain-text user community?



More information about the Unicode mailing list