Multiple Preposed Marks

Richard Wordingham richard.wordingham at
Wed Nov 9 14:27:42 CST 2016

On Wed, 9 Nov 2016 03:26:51 +0100
Philippe Verdy <verdy_p at> wrote:

> 2016-11-09 0:42 GMT+01:00 Richard Wordingham <
> richard.wordingham at>:

> > I believe a renderer is permitted to treat canonically equivalent
> > sequence differently so long as it does not believe it should treat
> > them differently.  However, that is irrelevant to this case.
> This is DIRECTLY relevant to the sentence in TUS you quoted, which is
> all about combining characters encoded after the base letter and
> often have non-zero combining classes and are reorderable

As you pointed out, it most clearly addresses the case of two combining
marks with the same canonical combining class, and obviously in such a
case the sequence is not reorderable.
> But evidently this sentence in TUS is not relevant to "prepended"
> combining marks that are all with combining class 0, here "prepended"
> meaning: encoded before the base character, but not after it even if
> they are visually combining before it, as is the case for wellknown
> Indic vowels that have now non-zero combining classes that allow them
> to be reordered before other combining marks when normalizing, but
> still remaining encoded after the base consonnant).

I can't guess what you mean:
(a) The combining marks in question *follow* the base consonant, but are
rendered before it.  'Preposition' is a property of abstract
characters, not of codepoints.

(b) All characters with an Indic Positional Category of 'left' (or
similar) have canonical combining class 0.

There is a simple example of the base outwards rule in the Tai Tham
script.  The only way of encoding Northern Thai /pʰɛː/ 'to chanɡe' with
SIGN MEDIAL RA and U+1A6F TAI THAM VOWEL SIGN AE acceptable to the
Universal Shaping engine is <U+1A38, U+1A55, U+1A6F>, and the visual
order is the reverse of the encoding order.  Unfortunately, it could be
argued that the encoding order is independent of the visual order.


More information about the Unicode mailing list