How to disable Indic syllable form editing in MS word

maxwell via Indic indic at unicode.org
Thu Dec 7 10:33:57 CST 2017


On 2017-12-06 18:19, Richard Wordingham via Indic wrote:
> Another technique, which has been available in emacs (I'm unsure of the
> current status), enables one to move the cursor into a cluster Unicode
> character by character, and disables shaping across the cluster.  Even
> this will have shortcomings when working with two part vowels
> canonically equivalent to a single character - one won't know whether
> one has one character or two until one steps into the cluster.

This brings up a related question that I've always wondered about.  In 
Bangla, there are two code points, U+09CB and U+09CC, which represent 
two-part vowels; one part appears to the left of the preceding 
consonant, and one to its right.  There are also three code points that 
individually represent the parts to the left and right: U+09C7 is the 
left-hand part of both U+09CB and U+09CC, U+09BE is the right-hand part 
of U+09CB, and U+09D7 is the right-hand part of U+09CC.  The 
relationship of U+09CB and U+09CC to the stand-alone characters is 
documented in the Unicode standard for the Bengali block.

Why are these not treated in the Unicode standard as analogous to 
base+diacritic pairs with respect to NCC and NCD?  E.g. when you convert 
text to NCC, why isn't a sequence of U+09C7 + consonant + U+09BE 
converted to consonant + U+09CB, and vice versa when converting to NCD?  
Instead, when we do things that require normalization (like searching 
for a word in text), we have to insert our own manual normalization step 
to handle this problem.

    Mike Maxwell
    University of Maryland



More information about the Indic mailing list