Multiple Preposed Marks

Philippe Verdy verdy_p at
Wed Nov 9 15:23:28 CST 2016

2016-11-09 21:27 GMT+01:00 Richard Wordingham <
richard.wordingham at>:

> On Wed, 9 Nov 2016 03:26:51 +0100
> Philippe Verdy <verdy_p at> wrote:
> > 2016-11-09 0:42 GMT+01:00 Richard Wordingham <
> > richard.wordingham at>:
> > > I believe a renderer is permitted to treat canonically equivalent
> > > sequence differently so long as it does not believe it should treat
> > > them differently.  However, that is irrelevant to this case.
> > This is DIRECTLY relevant to the sentence in TUS you quoted, which is
> > all about combining characters encoded after the base letter and
> > often have non-zero combining classes and are reorderable
> As you pointed out, it most clearly addresses the case of two combining
> marks with the same canonical combining class, and obviously in such a
> case the sequence is not reorderable.
> > But evidently this sentence in TUS is not relevant to "prepended"
> > combining marks that are all with combining class 0, here "prepended"
> > meaning: encoded before the base character, but not after it even if
> > they are visually combining before it, as is the case for wellknown
> > Indic vowels that have now non-zero combining classes that allow them
> > to be reordered before other combining marks when normalizing, but
> > still remaining encoded after the base consonnant).
> I can't guess what you mean:
> (a) The combining marks in question *follow* the base consonant, but are
> rendered before it.  'Preposition' is a property of abstract
> characters, not of codepoints.
> (b) All characters with an Indic Positional Category of 'left' (or
> similar) have canonical combining class 0.

Reread, I was very clear between these two cases, explicitly saying that
"PREPENDED" meant case (b). And yes I also said explicitly these had
combining class 0 and that they were then not subject to mutual reordering.

But the TUS sentence that YOU quoted was compleltely falling in case (a),
where "combining marks" may still appear before but are always encoded
after, and where they are freely (undistinctly) reorderable if they have
distinct non-zero combining classes: these combining characters have then
no well defined mutual positions. But in these cases, they are "supposed"
to not "interact typographically" (due to the fact they were encoded with
distinct combining positional classes), but this turns to be wrong in
various cases, notably for Hebrew diacritics (between vowel points and
other points modifying the consonnant) and for several kinds of Indic
diacritics (mixes of vowels halfvowels, and "liquid" halfconsonnants, and
within consonnant clusters). There are also some complex cases when using
non-Indic diacritics over Indic letters/clusters

For all these cases (a), CGJ must be used to block the possible reorderings
and then being able to compose the layout of clusters with the expected
typographic interactions when such interactions can effectively occur
(because the **effective** relative position is DEFINTELY NOT explicitly
encoded in any one of these combining characters with non-zero combining
classes (whose property names, like "above" or "below", are
counter-intuitive but only work with the most frequent simple cases where
there's a single diacritic after a base letter and for most base letters...
but not all, and without any consideration of the possible creation of
ligatures and complex clusters, notably in traditional Arabic, or in
decorative typographies for most all scripts including Latin)!

If you're still not convinced, look at how complex typographies are used
for "the name of God" in various religions and denominations (it's not just
the case of the Hebrew "tetragram"). You can also look at "calligrammes"
where the usual script layout is completely relaxed and where diacritics
may be moved anywhere around words and not necessarily near the base
letter; it is impossible to represent this typography with characters and
their Unicode properties. Indic scripts however have formalized some of
these freedoms of placements using complex positioning rules that are part
of their most common form.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

More information about the Unicode mailing list