Go romanize! Re: Counting Devanagari Aksharas

Richard Wordingham via Unicode unicode at unicode.org
Mon Apr 24 15:37:04 CDT 2017


On Mon, 24 Apr 2017 20:53:12 +0530
Naena Guru via Unicode <unicode at unicode.org> wrote:

> Quote by Richard:
> Unless this implies a spelling reform for many languages, I'd like to
> see how this works for the Tai Tham script.  I'm not happy with the
> Romanisation I use to work round hostile rendering engines.  (My
> scheme is only documented in variable hack_ss02 in the last script
> blocks of http://wrdingam.co.uk/lanna/denderer_test.htm.)  For
> example, there are several different ways of writing what one might
> naively record as "ontarAy".
> 
> MY RESPONSE:
> Richard, I stuck to the two specifications (Unicode and Font) and
> Sanskrit grammar. The akSara has two aspects, its sound (zabða,
> phoneme) and its shape. (letter, ruupa). Reduce the writing system to
> its consonants, vowels etc. (zabða) and assign SBCS letters/codes to
> them (ruupa). SBCS provides the best technical facilities for any
> language. (This is why now more than 130 languages romanize despite
> Unicode). Use English letters for similar sounds in the native
> speech. Now, treat all combinations as ligatures. For example, 'po'
> sound in Indic has the p consonant with a sign ahead plus a sign
> after.

In many Indic scripts, yes.  In Devanagari, the vowel sign is normally
a singly element classified as following the consonant.  In Thai, the
vowel sign precedes the consonant.  Tai Tham uses both a two-part sign
and a preceding sign.  The preceding sign is for Tai words and the
two-part sign for Pali words, but loanwords from Pali into the Tai
languages may retain the two part sign.

> For the font, there is no difference between the way it makes
> the combination 'ä', which has a sign above and the Indic having two
> on either side.

For OpenType, there is.  The first can be made by providing a
simple table of where the diaeresis goes relative to the base
characters, in this case the diaeresis.  The second is painfully
complicated, for the 'p' may have other marks attached to it, so doing
it be relative positioning is painfully complicated and error-prone.
This job is given to the rendering engine, which may introduce its own
problems.

AAT and Graphite offer the font maker the ability to move the 'sign
ahead' from after the 'p' to before it.

> Recall that long ago, Unicode stopped defining fixed
> ligatures and asked the font makers to define them in the PUA.

While the first is true enough, I believe the second is false.  Not
every glyph has to be mapped to by a single character.  I don't do that
for contextual forms or ligatures in my font.

> Spelling and speech:
> There is indeed a confusion about writing and reading in Hindi, as I
> have observed. Like in English and Tamil, Hindi tends to end words
> with a consonant. So, there is this habit among the Hindi speakers to
> drop the ending vowel, mostly 'a' from words that actually end with
> it. For example, the famous name Jayantha (miserable mine too, haha!
> = jayanþa as Romanized), is pronounced Jayanth by Hindi speakers. It
> is a Sanskrit word. Sanskrit and languages like Sinhhala have vowel
> ending and are traditionally spoken as such.

This loss is also to be found in Further India.  Thai, Lao and Khmer
now require that such a word-final vowel be written explicitly if it is
still pronounced.

> Looking at the word you gave, ontarAy, it looks to me like an
> Anglicized form. If I am to make a guess, its ending is like in
> ontarAyi. Is it said something like, own-the-raa-yi? (danger?) If I
> am right, this is a good example of decline if a writing system owing
> to bad, uncaring application of technology. We are in the Digital
> Age, and we need not compromise any more. In fact, we can fix errors
> and decadence introduced by past technologies.

The word indeed means 'danger' (Pali/Sanskrit _antarāya_).  The
pronunciation is /ʔontʰalaːi/; the Tai languages that use(d) the Tai
Tham script no longer have /r/.  The older sequence /tr/ normally
became /tʰ/ (except in Lao), but the spelling has not been updated - at
least, not amongst the more literate.  The script has a special symbol
for the short vowel /o/, which it shares with the Lao script.  This
symbol is used in writing that word.  Two ways I have seen it spelt,
each with two orthographic syllables, are ᩋᩫ᩠ᨶᨲᩕᩣ᩠ᨿ on-trAy (the second
syllable has two stacks) and ᩋᩫᨶ᩠ᨲᩕᩣ᩠ᨿ o-ntrAy.  I have also seen a
form closer to Pali, namely _antarAy_, written ᩋᨶ᩠ᨲᩁᩂ᩠ᨿ a-nta-rAy.
However, I have seen nothing that shows that I won't encounter
ᩋᩢᨶ᩠ᨲᩁᩣ᩠ᨿ a-nta-rAy with the first vowel written explicitly, or even
ᩋᩢ᩠ᨶᨲᩁᩣ᩠ᨿ an-ta-rAy. How does your scheme distinguish such alternatives?

Richard.



More information about the Unicode mailing list