Counting Devanagari Aksharas

Thu Apr 20 13:30:25 CDT 2017

On Thu, 20 Apr 2017 15:33:37 +0530
Shriramana Sharma via Unicode <unicode at unicode.org> wrote:

> All I can say is that Tamil script has eschewed most consonant cluster
> ligatures/conjoining forms. As for Devanagari, writing श्रीमान्‌को (I
> used ZWNJ) i.o. श्रीमान्को is quite possible with existing technology.
> The latter would be Sanskrit orthography and former perhaps Hindi,
> although I wouldn't know why anyone would want to run in the को with
> the preceding श्रीमान् even in Hindi.

According to p23 of
http://www.unicode.org/L2/L2011/11370-devanagari-vip-issues.pdf, it's
Nepali.  It's a compromise between श्रीमान्को and Hindi-style श्रीमान्
को.

> And IMO it would be better to
> clearly define at the outset what you meant by "akshara" in your
> question to avoid confusions by people replying having a different
> idea of the meaning of that term.

I didn't want to be any more precise than "orthographic syllable".
Swaran Lata is urging, in submission
http://www.unicode.org/L2/L2017/17094-indic-text-seg.pdf to the UTC,
that UAX#29 "Unicode Text Segmentation" adopt a rather naïve definition
of an Indian orthographic syllable.  The worst outcome in my opinion
would be if it were adopted for the extended grapheme cluster
definition - it would make editing orthographic clusters even more
difficult.  However, it would make sense for CLDR to carry localised
definitions.

For layout, the definition would be relevant for 'drop capital effects'
and for the analogue of inserting spaces between letters.  There are
recommendations in a maturing W3C specification for Indic layout,
though to be fair the specification fairly quickly restricts its scope
to Indian scripts.  Now, if the spacing were applied to the Nepali word 
श्रीमान्‌को I would expect to see something like श्री मा न् को, as the
base word itself would appear as श्री मा न्  when subjected to the
same treatment. However, before suggesting minor improvements that might
be in order, I thought I should check whether there was agreement that
<VIRAMA, ZWNJ> terminated an orthographic syllable.  It now seems that
any general agreement would in fact be that it did *not* terminate an
orthographic syllable!  I must say that stretching श्रीमान्‌को out as
श्री मा न्‌को  feels wrong.  If my feeling is right, then the definition
of orthographic syllable, if it can be done without reference to a
font, belongs in CLDR, as UAX#29 implies, and not in the Unicode
Character Database and Unicode standards.

Richard.