Choosing the Set of Renderable Strings

Wed May 16 16:39:36 CDT 2018

On Wed, 16 May 2018 05:23:08 -0800
James Kass via Unicode <unicode at unicode.org> wrote:

> Note that although the proposal gave canonical combining class
> zero to both the tone marks and the vowel signs, the on-line Unicode
> data gives canonical combining class 230 to the tone marks.

There were several changes from ccc=0 to non-zero that were sneaked in
between the UTC agreeing to proceed with the proposal and Unicode 5.2
being published.  That may have been a test of vigialnce; we failed.  I
have seen no benefit from the changes - U+A160 TAI THAM SIGN SAKOT is
not a virama (it should not appear in valid text), and having the tone
marks and the invisible stacker have distinct non-zero classes has
caused lots of irritation.

We should probably have risked Tai Tham being excluded from the BMP and
gone for the Tibetan model; normalised would not then damage Tai tham
text.

> > **The placement may be different to that of MAI KANG
> > in /bɔː waː/ ᨷᩴ᩠᩵ᩅᩣ <BA, MAI KANG, TONE-1, SAKOT, WA,
> > SIGN AA> or ᨷᩴ᩠ᩅ᩵ᩣ <BA, MAI KANG, SAKOT, WA, TONE-1,
> > SIGN AA> - I don't know whether the first or the second
> > tone mark is dropped.  

> FWIW, neither is dropped in the display here, although they don't
> display identically.  The first string shows TONE-1 positioned to the
> right of MAI KANG, the second string superimposes them.  (Windows 7
> running LibreOffice in order to enable the USE from HarfBuzz.)

The full uncontracted writing is <BA, MAI KANG, TONE-1, WA, TONE-1, SIGN
AA>.  Both syllables have TONE-1, but I have not seen two identical
tone marks from different phonetic syllables in the same stack.  The
person typing the contraction drops a tone mark, not the rendering
system.

> Substituting U+1A36 TAI THAM LETTER NA for BA in the above strings,
> ᨶᩴ᩠᩵ᩅᩣ  ᨶᩴ᩠ᩅ᩵ᩣ, and trying to get the ligature are in the attached
> *.PNG file. Here's the four strings for the PNG:
> 
> \u1A36\u1A74\u1A75\u1A60\u1A45\u1A63
> \u1A36\u1A74\u1A60\u1A45\u1A75\u1A63
> \u1A36\u1A75\u1A63\u1A74
> \u1A36\u1A63\u1A74\u1A75

A lot of fonts have trouble ligating NA and AA when there is material
between them.  (Hint: Classify all non-spacing subscript consonants as
marks, and spacing subscript consonants as bases, and set the ligating
lookup to ignore marks.)

Your example appears to be using the font called 'A Tai Tham KH New'.
While the only way to type Pali _bho_ 'O' after other text in this font
or 'A Tai Tham KH' is to enter the correct sequence <LOW PHA, SIGN E,
SIGN AA>, the former font cannot render Pali _mano_ 'mind' (also used in
Northern Thai and probably also Tai Khuen) if one types the correct
sequence <MA, NA, SIGN E, SIGN AA>.  One has to type <MA, NA, SIGN AA,
SIGN E>!  The *older* font 'A Tai Tham KH (at Version 2.0) does render the
correct spelling properly.  As an example of correct rendering, I
include the Pali for 'O mind!', _bho mano_, encoded  <LOW PHA, SIGN E,
SIGN AA, MA, NA, SIGN AA, SIGN E>, as rendered by the Lamphun font.

Richard.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: o_mind.png
Type: image/png
Size: 2049 bytes
Desc: not available
URL: <http://unicode.org/pipermail/unicode/attachments/20180516/20d31a62/attachment.png>