"A Programmer's Introduction to Unicode"

Mark E. Shoulson mark at kli.org
Mon Mar 13 19:20:25 CDT 2017

A word ending in A *or* AA preceding a word beginning in A *or* AA will 
all coalesce to a single AA in Sanskrit.  That's four possibilities, and 
that doesn't count a word ending in a consonant preceding a word 
beginning in AA, which would be written the same.  My memory is rusty, 
so I should actually be looking things up, but I think these are valid 

न + अगच्छत्  →  नागच्छत्
न + आगच्छत्  → नागच्छत्

(and indeed, आगच्छत् is the upasarga आ plus अगच्छत्, so there too the A 
+ AA coalesced.)  I should probably find you examples for all the other 
possibilities.  Sanskrit external vowel sandhi is comparatively 
straightforward (compared to consonant sandhi), and it frequently loses 
information.  A *or* AA plus I is E; A *or* AA plus U is O (you need A + 
O to get AU).


On 03/13/2017 06:26 PM, Manish Goregaokar wrote:
> Do you have examples of AA being split that way (and further reading)?
> I think I'm aware of what you're talking about, but would love to read
> more about it.
> -Manish
> On Mon, Mar 13, 2017 at 2:47 PM, Richard Wordingham
> <richard.wordingham at ntlworld.com> wrote:
>> On Mon, 13 Mar 2017 23:10:11 +0200
>> Khaled Hosny <khaledhosny at eglug.org> wrote:
>>> But there are many text operations that require access to Unicode code
>>> points. Take for example text layout, as mapping characters to glyphs
>>> and back has to operate on code points. The idea that you never need
>>> to work with code points is too simplistic.
>> There are advantages to interpreting and operating on text as though it
>> were in form NFD.  However, there are still cases where one needs
>> fractions of a character, such as word boundaries in Sanskrit, though I
>> think the locations are liable to be specified in a language-specific
>> form.  U+093E DEVANAGARI VOWEL SIGN AA can have a word boundary in it
>> in at least 4 ways.
>> Richard.

More information about the Unicode mailing list