James Kass via Unicode
unicode at unicode.org
Fri Feb 8 19:42:44 CST 2019
Rather than having the user insert the VS14 after every character, the
editor might allow the user to select a span of text for italicization.
Then it would be up to the editor/app to insert the VS14s where appropriate.
For Andrew’s example of “fête”, the user would either type the string:
“f” + “ê” + “t” + “e”
or the string:
“f” + “e” + <U+0300 COMBINING CIRCUMFLEX ACCENT> + “t” + “e”.
If the latter, the application would insert VS14 characters after the
“f”, “e”, “t”, and “e”. The application would not insert a VS14 after
the combining circumflex — because the specification does not allow VS
characters after combining marks, they may only be used on base characters.
In the first ‘spelling’, since the specifications forbid VS characters
after any character which is not a base character (in other words, not
after any character which has a decomposition, such as “ê”) — the
application would first need to convert the string to the second
‘spelling’, and proceed as above. This is known as converting to NFD.
So in order for VS14 to be a viable approach, any application would ①
need to convert any selected span to NFD, and ② only insert VS14 after
each base character. And those are two operations which are quite
possible, although they do add slightly to the programmer’s burden. I
don’t think it’s a “deal-killer”.
Of course, the user might insert VS14s without application assistance.
In which case hopefully the user knows the rules. The worst case
scenario is where the user might insert a VS14 after a non-base
character, in which case it should simply be ignored by any
application. It should never “break” the display or the processing; it
simply makes the text for that document non-conformant. (Of course
putting a VS14 after “ê” should not result in an italicized “ê”.)
More information about the Unicode