Counting Devanagari Aksharas

Manish Goregaokar via Unicode unicode at unicode.org
Fri Apr 21 18:27:43 CDT 2017


> Do Hindi speakers really think of orthographic syllables as characters?

When rendered as a cluster, yes? I've asked around, and folks seem to
insist on coupling it to the rendering. Given most fonts render
*normal* (common, etc) clusters, I think making them EGCs and looking
at nonrendered clusters the same way we do family emoji is fine
(family emojis of length 5 are a single EGC, but that's not what's
actually perceived by the user, but it's a use case that's very rare
in the wild, so it doesn't matter). The way I see it, the current
system is wrong, and so would the proposed system of not breaking at
viramas (or not breaking at viramas followed by a consonant if we want
to be more precise), but the proposed system would be wrong much less
often.

I am only talking about Devanagari, though scripts like
Bangla/Gujrati/Gurmukhi may have similar needs. Breaking on ZWNJ seems
sensible.
-Manish


On Fri, Apr 21, 2017 at 4:04 PM, Richard Wordingham via Unicode
<unicode at unicode.org> wrote:
> On Thu, 20 Apr 2017 11:17:05 -0700
> Manish Goregaokar via Unicode <unicode at unicode.org> wrote:
>
>> On Wed, Apr 19, 2017 at 4:35 PM, Richard Wordingham via Unicode
>> <unicode at unicode.org> wrote:
>
>> > Is there consensus on how to count aksharas in the Devanagari
>> > script? The doubts I have relate to a visible halant in
>> > orthographic syllables other than the first.
>
>> I don't think there's consensus.
>
> I've found related discussion at
> https://lists.w3.org/Archives/Public/public-i18n-indic/.  The question
> of how to count was raised and not answered there.
>
>> On Wed, Apr 19, 2017 at 4:35 PM,
>> Richard Wordingham via Unicode <unicode at unicode.org> wrote:
>> > Is there consensus on how to count aksharas in the Devanagari
>> > script? The doubts I have relate to a visible halant in
>> > orthographic syllables other than the first.
>
>> I'm of the opinion that Unicode should start considering devanagari
>> (and possibly other indic) consonant clusters as single extended
>> grapheme clusters.
>
> Do Hindi speakers really think of orthographic syllables as characters?
>
> What may be useful is the concept of a definition of an orthographic
> syllable.  It may be possible to get the information from a font -
> depending on the renderer - but a locale-dependent definition should be
> possible for use as a fall-back.  Devanagari rules won't work for
> Tamil, and I think rules for Hindi and Nepali will be slightly
> different - <VIRAMA, ZWNJ> looks like a problem.
>
> The concept is possibly not useful in some Indic scripts - the concept
> won't work well in Thai, but will work in Pali in the Thai script, for
> both Pali orthographies.
>
> Richard.


More information about the Unicode mailing list