How to disable Indic syllable form editing in MS word
maxwell via Indic
indic at unicode.org
Thu Dec 7 10:33:57 CST 2017
On 2017-12-06 18:19, Richard Wordingham via Indic wrote:
> Another technique, which has been available in emacs (I'm unsure of the
> current status), enables one to move the cursor into a cluster Unicode
> character by character, and disables shaping across the cluster. Even
> this will have shortcomings when working with two part vowels
> canonically equivalent to a single character - one won't know whether
> one has one character or two until one steps into the cluster.
This brings up a related question that I've always wondered about. In
Bangla, there are two code points, U+09CB and U+09CC, which represent
two-part vowels; one part appears to the left of the preceding
consonant, and one to its right. There are also three code points that
individually represent the parts to the left and right: U+09C7 is the
left-hand part of both U+09CB and U+09CC, U+09BE is the right-hand part
of U+09CB, and U+09D7 is the right-hand part of U+09CC. The
relationship of U+09CB and U+09CC to the stand-alone characters is
documented in the Unicode standard for the Bengali block.
Why are these not treated in the Unicode standard as analogous to
base+diacritic pairs with respect to NCC and NCD? E.g. when you convert
text to NCC, why isn't a sequence of U+09C7 + consonant + U+09BE
converted to consonant + U+09CB, and vice versa when converting to NCD?
Instead, when we do things that require normalization (like searching
for a word in text), we have to insert our own manual normalization step
to handle this problem.
Mike Maxwell
University of Maryland
More information about the Indic
mailing list