How to disable Indic syllable form editing in MS word

Richard Wordingham via Indic indic at unicode.org
Thu Dec 7 14:38:17 CST 2017


On Thu, 7 Dec 2017 14:35:24 +0530
Shriramana Sharma via Indic <indic at unicode.org> wrote:

> On 07-Dec-2017 4:51 AM, "Richard Wordingham via Indic"
> <indic at unicode.org> wrote:
> 
> On Wed, 6 Dec 2017 18:47:36 +0530
> >
> > When one try to Find & replace any particular Unicode Character
> > For Example
> > to replace all
> > 'ा' depended vowel AA
> > with
> > 'ि' depended vowel i
> >
> > it does not works.
> >
> > Only full syllable with ' ा' i.e. का, खा, गा, etc. has to be search
> > and replaced one by one with many repeats.
> >
> > This takes too much time and unnecessary repeats.  
> 
> 
> @Richard he has a valid point, so no need to ask him whether he is
> really an Indian! Where else would you want to search-replace a vowel
> sign except as part of a grapheme cluster? Surely well formed normal
> text won't have one of those standing alone?

He's what I've been looking for - evidence that Indic grapheme clusters
are not what Indian users think of as a character.  (Admittedly, Tamil
shows many signs of being an incipient syllabary.)

> > When one try to delete a Indic Character with delete key putting the
> > cursor before a syllable, the right side entire syllable is being
> > deleted.
> >
> > How to delete a particular character instead of entire syllable?  
> 
> 
> Press right arrow and use back space. You can't do it from the left.
> 
> > How to disable the Indic layout feature in MS word?  
> 
> 
> There is no option for this to my knowledge nor is there likely to be
> (though not impossible).
> 
> >
> > Would anybody guide please?  
> 
> Are you a real Indian?  UTS#29
> (https://www.unicode.org/reports/tr29/tr29-31.html) Section 3
> Paragraph 1 strongly suggests that what you are trying to do is not
> natural.
> 
> 
> Ok let's quote:
> 
> "Grapheme clusters commonly behave as units in terms of mouse
> selection, arrow key movement, backspacing, and so on. For example,
> when a grapheme cluster is represented internally by a character
> sequence consisting of base character + accents, then using the right
> arrow key would skip from the start of the base character to the end
> of the last accent.
> 
> However, in some cases editing a grapheme cluster element by element
> may be preferable. For example, on a given system the backspace key
> might delete by code point, while the delete key may delete an entire
> cluster."
> 
> This doesn't say anything about search and replace.

Paragraph 2 comes close:

"Grapheme cluster boundaries are important for *collation*, regular
expressions, UI interactions (such as mouse selection, arrow key
movement, backspacing), segmentation for vertical text, identification
of boundaries for first-letter styling, and counting “character”
positions within text. Word boundaries, line boundaries, and sentence
boundaries should not occur within a grapheme cluster: in other words,
a grapheme cluster should be an atomic unit with respect to the process
of determining these other boundaries."

If one accepts that collation is important for search, then that is
where search and replace comes in.  I don't like the principle - I end
up having to use the 'regular expression' option when searching for
short Tai Tham strings in LibreOffice.

> There is unlikely
> to be a universally acceptable or even applicable solution for intra
> grapheme cursor placement. For example how would you indicate a
> cursor position in front of or after a virama which has caused two
> consonants to ligate?

When characters have fused to a single glyph, the usual technique, used
even by Windows for non-Indic text, is to choose a boundary position
within the glyph.  OpenType has a table of positions, but apparently
Windows just places the divisions evenly.  Obviously the simple scheme
doesn't work well when a preposed vowel ligates with the base consonant.

Richard.



More information about the Indic mailing list