Encoding italic

James Kass via Unicode unicode at unicode.org
Fri Feb 8 19:42:44 CST 2019


William,

Rather than having the user insert the VS14 after every character, the 
editor might allow the user to select a span of text for italicization.  
Then it would be up to the editor/app to insert the VS14s where appropriate.

For Andrew’s example of “fête”, the user would either type the string:
“f” + “ê” + “t” + “e”
or the string:
“f” + “e” + <U+0300 COMBINING CIRCUMFLEX ACCENT> + “t” + “e”.

If the latter, the application would insert VS14 characters after the 
“f”, “e”, “t”, and “e”.  The application would not insert a VS14 after 
the combining circumflex — because the specification does not allow VS 
characters after combining marks, they may only be used on base characters.

In the first ‘spelling’, since the specifications forbid VS characters 
after any character which is not a base character (in other words, not 
after any character which has a decomposition, such as “ê”) — the 
application would first need to convert the string to the second 
‘spelling’, and proceed as above.  This is known as converting to NFD.

So in order for VS14 to be a viable approach, any application would ① 
need to convert any selected span to NFD, and ② only insert VS14 after 
each base character.  And those are two operations which are quite 
possible, although they do add slightly to the programmer’s burden.  I 
don’t think it’s a “deal-killer”.

Of course, the user might insert VS14s without application assistance.  
In which case hopefully the user knows the rules.  The worst case 
scenario is where the user might insert a VS14 after a non-base 
character, in which case it should simply be ignored by any 
application.  It should never “break” the display or the processing; it 
simply makes the text for that document non-conformant.  (Of course 
putting a VS14 after “ê” should not result in an italicized “ê”.)

Cheers,

James



More information about the Unicode mailing list