Distinguishing COENG TA from COENG DA in Khmer script

Richard Wordingham richard.wordingham at ntlworld.com
Tue Jun 23 19:03:17 CDT 2020


On Tue, 23 Jun 2020 15:50:27 -0700
Asmus Freytag via Unicode <unicode at unicode.org> wrote:

> On 6/23/2020 4:54 AM, Richard Wordingham via Unicode wrote:
> The modern Khmer language does not make use of a COENG DA distinct
> from COENG TA.  The normal practice is to render them the same, with a
> recommendation from Unicode that the choice be based on the sound the
> subscript represents.  At least, there was such a recommendation; I
> tried to find it again, but failed.  The visual distinction faded out
> in the 1920's according to Antelme.
> 
> Now, the Khmer script is not just used for modern languages of
> Cambodia.  It is used for transcribing Old Khmer (for words, at least)
> and was the religious script of most of Thailand until the 19th
> century, and was also the secular script in southern Thailand.  In
> these usages, COENG TA and COENG DA are distinct, or at least, TA and
> DA have distinct subscripts that are clearly associated with them.
> 
> Is it legitimate for a font to deliberately render the corresponding
> named sequences differently while claiming to respect characters'
> character identities?  I thought it obviously was, but I received a
> demurral when I asked about the best way to request an arbitrary
> OpenType font to make the distinction.  (I expect the overwhelming
> majority would refuse to make the distinction.)  I am therefore asking
> here for advice on the legitimacy of such a request. Conceivably we
> need a new character to make the distinction.
> 
> Richard.
> 
> The recommendation you cite is a bit "common sense". I believe,
> without actual knowledge, that there are no "dt" or "td" combinations
> only "dd" and "tt". In that case, a spell checker can help you
> pick the correct code for the subscript form.

That's a grammar rule - I'm not that spell checkers can exploit it.
While Series one ('a') normally has /nt/ for base consonant, there are
or were (my source is Huffman) a few words with /nd/, and there are a
few words that can be said either way (Durdin 2018, I think).

My immediate concern was the alternative Old Khmer spellings អ្តា and
អ្ដា, which look identical in most fonts.  However, I am told the
Windows UI font Leelawadee UI distinguishes them, which could make it
difficult to outlaw deliberate distinction.

> Now, the identity of the characters is DA and TA (the COENG forms a
> sequence). Therefore, you don't violate the identity of DA and TA if
> you render their subscript forms distinct.

Don't multipoint characters get the same protection?  COENG DA and
COENG TA are named sequences.

> If you have a font that works that way, it may not be usable for
> modern Khmer (unless the there's a language tag to select the
> behavior). That's a font issue.

I'm not sure that we can get an OpenType language tag for 19th century
Khmer.  However, it seems that the feature tag 'hist' would be
appropriate.  One could try tagging as Southern Thai (ISO-693 sou), but
that's another can of worms.  Tagging as Sanskrit might work - I don't
know enough about modern Khmer script Sanskrit.

Richard.



More information about the Unicode mailing list