Amiguity(?) in Sinhala named sequences
cibucj at gmail.com
Sun Oct 16 22:12:54 CDT 2016
Isn't this question analogous to asking whether the layout engine should
use C1-conjoining form or C2-conjoining form for a <C1, Virama, C2>
sequence in any indic? that is, whether the <C1, Virama> should form a
glyph while C2 keeping its independent form or vice versa. (Potentially
there can be more forms - that is, full ligature and explicit Virama form).
If the question you asked is equivalent, then the answer is traditionally
is left to the font to decide.
BTW, even for a given C1 and C2 for a given script, a font can potentially
choose a different answer based on its its purpose/character, like a font
for Malayalam traditional script Vs a font for reformed script.
On Mon, Oct 17, 2016 at 12:15 AM, Harshula <harshula at hj.id.au> wrote:
> Hi Martin,
> On 15/10/16 04:07, Martin Jansche wrote:
> > For Sinhala, the following named sequences are defined (for good
> > SINHALA CONSONANT SIGN YANSAYA;0DCA 200D 0DBA
> > SINHALA CONSONANT SIGN RAKAARAANSAYA;0DCA 200D 0DBB
> > SINHALA CONSONANT SIGN REPAYA;0DBB 0DCA 200D
> > I'll abbreviate these as Yansaya, Rakaransaya, and Repaya, and I'll
> > write Ya for 0DBA and Ra for 0DBB.
> > Note that these give rise to two potentially ambiguous codepoint
> > strings, namely
> > 0DBB 0DCA 200D 0DBA
> > 0DBB 0DCA 200D 0DBB
> > I'll concentrate on the first, as all arguments apply to the second one
> > analogously.
> > At a first glance, the sequence 0DBB 0DCA 200D 0DBA has two possible
> > 0DBB + 0DCA 200D 0DBA, i.e. Ra + Yansaya
> > 0DBB 0DCA 200D + 0DBA, i.e. Repaya + Ya
> > First question: Does the standard give any guidance as to which one is
> > the intended parse? The section on Sinhala in the Unicode Standard is
> > silent about this. Is there a general principle I'm missing?
> > Sri Lanka Standard SLS 1134 (2004 draft) states that Ra+Yansaya is not
> > used and is considered incorrect, suggesting that the second parse
> > (Repaya+Ya) should be the default interpretation of this sequence.
> > However, SLS 1134 does not address the potential ambiguity of this
> > sequence explicitly and the description there could be read as
> > informative, not normative.
> 1) re: 0DBB 0DCA 200D 0DBA
> SLS 1134 was updated in 2011 (The latest public version I could find is
> v3.41. This extract is the same in v3.6.):
> 4D957C56.5050204 at cse.mrt.ac.lk/1/
> "1. The yansaya is not used following the letter ර. e.g.: the spelling
> කාර්ය is incorrect."
> If the above is insufficient, it's best to discuss the issue with Harsha
> (CC'd) and Ruvan (CC'd).
> 2) re: 0DBB 0DCA 200D 0DBB
> Harsha & Ruvan can clarify this too.
> > Second question: Given that one parse of this sequence should be the
> > default, how does one represent the non-default parse?
> > In most cases one can guess what the intended meaning is, but I suspect
> > this is somewhat of a gray area. In practice, trying to render these
> > problematic sequences and their neighbors in HarfBuzz with a variety of
> > fonts results in a variety of outcomes (including occasionally
> > unexpected glyph choices). If the meaning of these sequences is not well
> > defined, that would partly explain the variation across fonts.
> > Am I missing something fundamental? If not, it seems this issue should
> > be called out explicit in some part of the standard.
> > Regards,
> > -- martin
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Unicode