Combining Class of Thai Nonspacing_Marks

Richard Wordingham richard.wordingham at ntlworld.com
Tue Apr 4 23:37:08 CDT 2017


On Wed, 5 Apr 2017 10:45:43 +0700
"Gerriet M. Denkmann" <gerrietm at icloud.com> wrote:

> > On 4 Apr 2017, at 23:51,Richard Wordingham
> > <richard.wordingham at ntlworld.com> wrote:

> > The order of MAITAIKHU and tone mark is significant - it should
> > affect rendering.    

> Most fonts disagree (exception: Tahoma and Microsoft Sans Serif). Are
> there minority languages where the order has really a semantic
> meaning?

I think not.  Most fonts are incompetent at displaying typing errors.

> Could one create a list of all possible combinations of non-spacing
> marks for Thai, minority languages and languages written using Thai
> characters (e.g. Pali, Sanskrit, Khmer, Burmese, etc.)? Including
> cases, where the order of these marks has a semantical meaning.

> The next step would then to agree on rules of normalisation.

Most of the 'normalisation' is straight forward.

1) Repeatedly swap mark above and following mark below.
2) Apply Unicode normalisation.

Then
3) Use a font that uses mark-to-mark positioning on all combinations of
vowels above and all combinations of vowel below.

NIKHAHIT followed by SARA AA needs special handling.  I am not sure
how well the general case will work - particularly with fonts that do
their own reordering.

You also need to decide whether to fold <SARA I, NIKHAHIT> and <SARA
UE>.  I've started to see fonts make an artificial distinction.

You may wish to note that it can be very hard to tell the difference
between U+002D HYPHEN-MINUS and U+2013 EN DASH in file names.

Richard.


More information about the Unicode mailing list