Multiple Preposed Marks
verdy_p at wanadoo.fr
Tue Nov 8 20:26:51 CST 2016
2016-11-09 0:42 GMT+01:00 Richard Wordingham <
richard.wordingham at ntlworld.com>:
> On Wed, 9 Nov 2016 00:00:01 +0100
> Philippe Verdy <verdy_p at wanadoo.fr> wrote:
> > 2016-11-08 9:30 GMT+01:00 Richard Wordingham <
> > richard.wordingham at ntlworld.com>:
> > > TUS Section 2.11 says, "If the combining characters can interact
> > > typographically—for example, U+0304 combining macron and U+0308
> > > combining diaeresis — then the order of graphic display is
> > > determined by the order of coded characters (see Table 2-5).
> > > By default, the diacritics or other combining characters are
> > > positioned from the base character’s glyph outward".
> > The interpretation of "If the combining characters can interact
> > typographically" should be better read as "If the combining
> > characters have the same non-zero combining class or any one of them
> > has a zero combining class".
> The combining marks in question both have canonical combining class 0.
> > But now normalization is everywhere and causes the pairs using the
> > condition above to be freely reordered (or decomposed and recomposed,
> > meaning that the encoding order is NOT significant at all).
> I believe a renderer is permitted to treat canonically equivalent
> sequence differently so long as it does not believe it should treat
> them differently. However, that is irrelevant to this case.
This is DIRECTLY relevant to the sentence in TUS you quoted, which is all
about combining characters encoded after the base letter and often have
non-zero combining classes and are reorderable
But evidently this sentence in TUS is not relevant to "prepended" combining
marks that are all with combining class 0, here "prepended" meaning:
encoded before the base character, but not after it even if they are
visually combining before it, as is the case for wellknown Indic vowels
that have now non-zero combining classes that allow them to be reordered
before other combining marks when normalizing, but still remaining encoded
after the base consonnant).
What I want to say is that this sentence in TUS is quite ambiguous: it
speaks about graphic interaction, but this is not really encoded in text
sequences and forgets the the effect of combining classes on combining
sequences, which NEVER considers any actual graphic interaction (simply
because it is not specified and the actual graphic interactions may depend
on font styles (notably in honorific Arabic typography using very complex
layouts, but even within the Latin script when using decorated font styles
or custom ligatures where complex also interactions occur, including on
larger spans than clusters, such as full words).
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Unicode