Indic Syllabic Categories

Mon May 12 13:43:04 CDT 2014

Richard Wordingham asked:

> Is the provisional property 'Indic_Syllabic_Category' defined by
> anything deeper than the UCD file IndicSyllabicCategory itself?  

Basically, no. It simply gathers together information scattered
about in the core spec and elsewhere about claims regarding
what all the characters are. The classification has been undergoing
further review and will be updated again shortly for the 7.0
Unicode release with some further distinctions and corrections.

However, the file(s) (and properties) will remain provisional for
Unicode 7.0. And there is no overarching UTR which provides
a definitive model for all of these categories. The values are
evolving more along the lines of what is proving useful for
implementation, rather than being a priori defined categories.

> Is the property meant to be tailorable?  For example, there are
> encoded characters in the Khmer script that serve as tone marks when it
> is used to write Thai.

For a property to be "tailorable" in a Unicode context, you pretty
much have to have some kind of algorithm defined which uses
those property values and then changes them in some systematic
way to modify the outcome of the algorithm. In this case, there
is no Unicode algorithm defined (although implementers may
have specific algorithms in their rendering engines), and the data
is all provisional.

There is a probability that the two Indic category files may be promoted
to *informative* status as of Unicode 8.0, with further modifications,
extensions, and corrections. The main difference would be that once
a property becomes *informative* in the UCD, the UTC would be
committed to keeping it around and maintaining it. By contrast,
a provisional property can just be removed, if it doesn't pan out.

My suggestion, for those who are interested in this topic, would be
to review the relevant data files, implied script behaviors, and documents
and proposals in the UTC document register -- and over the course of
the next year participate in providing feedback on this topic and
the data files, so that if/when the files and related properties become
informative for Unicode 8.0 next year sometime, these questions
and any concerns about the various edge cases as applied to Southeast
Asian scripts, can be addressed before the properties become more
difficult to update.

--Ken