Multiple Preposed Marks

Philippe Verdy verdy_p at
Tue Nov 8 20:26:51 CST 2016

2016-11-09 0:42 GMT+01:00 Richard Wordingham <
richard.wordingham at>:

> On Wed, 9 Nov 2016 00:00:01 +0100
> Philippe Verdy <verdy_p at> wrote:
> > 2016-11-08 9:30 GMT+01:00 Richard Wordingham <
> > richard.wordingham at>:
> >
> > > TUS Section 2.11 says, "If the combining characters can interact
> > > typographically—for example, U+0304 combining macron and  U+0308
> > > combining  diaeresis — then  the  order  of  graphic  display  is
> > > determined  by  the  order  of  coded  characters  (see Table 2-5).
> > > By  default,  the  diacritics  or other combining characters are
> > > positioned from the base character’s glyph outward".
> > The interpretation of   "If the combining characters can interact
> > typographically" should be better read as "If the combining
> > characters have the same non-zero combining class or any one of them
> > has a zero combining class".
> The combining marks in question both have canonical combining class 0.
> > But now normalization is everywhere and causes the pairs using the
> > condition above to be freely reordered (or decomposed and recomposed,
> > meaning that the encoding order is NOT significant at all).
> I believe a renderer is permitted to treat canonically equivalent
> sequence differently so long as it does not believe it should treat
> them differently.  However, that is irrelevant to this case.

This is DIRECTLY relevant to the sentence in TUS you quoted, which is all
about combining characters encoded after the base letter and often have
non-zero combining classes and are reorderable

But evidently this sentence in TUS is not relevant to "prepended" combining
marks that are all with combining class 0, here "prepended" meaning:
encoded before the base character, but not after it even if they are
visually combining before it, as is the case for wellknown Indic vowels
that have now non-zero combining classes that allow them to be reordered
before other combining marks when normalizing, but still remaining encoded
after the base consonnant).

What I want to say is that this sentence in TUS is quite ambiguous: it
speaks about graphic interaction, but this is not really encoded in text
sequences and forgets the the effect of combining classes on combining
sequences, which NEVER considers any actual graphic interaction (simply
because it is not specified and the actual graphic interactions may depend
on font styles (notably in honorific Arabic typography using very complex
layouts, but even within the Latin script when using decorated font styles
or custom ligatures where complex also interactions occur, including on
larger spans than clusters, such as full words).
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

More information about the Unicode mailing list