Suppressing Ligation of Spacing Marks

Richard Wordingham richard.wordingham at ntlworld.com
Wed Nov 9 13:53:35 CST 2016


On Wed, 9 Nov 2016 22:10:34 +0900
Norbert Lindenberg <unicode at lindenbergsoftware.com> wrote:

> The part of the specification of the Universal Shaping Engine [1]
> that deals with ZWNJ is a bit unclear, but I read it to mean that
> ZWNJ should not cause the insertion of a dotted circle if the
> character following it has general category Mn or Mc.
> 
> The USE specification says: "The zero-width non-joiner is used to
> prevent a fusion of two characters. It continues a preceding cluster
> but causes a cluster break after itself when the following character
> is not a mark character (gc=Mn or gc=Mc).”
> 
> The specification does not say how this character should be handled
> in cluster validation. I assume first that the statement about the
> combining grapheme joiner also applies to ZWNJ: “CGJ has been omitted
> from the above schema in order to avoid unnecessary complexity”. I
> further interpret the little the spec does say about ZWNJ to imply
> that it should be allowed before any character with general category
> Mn or Mc, without affecting the validity of the cluster. Inserting a
> dotted circle would be equivalent to causing a cluster break, which
> the spec rules out when the following character has general category
> Mn or Mc.

That makes sense, but I was hoping for an opinion independent of the
Microsoft policy.

>  U+1A63 has gc=Mc, so it shouldn’t be preceded by a dotted circle in
> the sequence <NA, ZWNJ, SIGN AA, …>. Note that I omitted the first
> “…” from the sequence you provided, because an intervening character
> might trigger the dotted circle.

The word, meaning 'to foretell' can be seen at
http://www.wrdingham.co.uk/lanna/renderer_test.htm .  The full encoding
of the syllable is <U+1A36 NA, U+1A60 SAKOT, U+1A45 WA, U+200C ZWNJ,
U+1A63 SIGN AA, U+1A60 SAKOT, U+1A3F LOW YA>.  MS Edge, running on an
evaluation copy of Windows 10 kindly provided for checking web page
displays in MS Edge, inserts dotted circles after* ZWNJ and before the
second SAKOT.  The second insertion is because USE does not recognise
Indic CVC orthographic syllables, which make up about half the native
vocabulary in the region.  Pali is less badly affected, though one
can't write _nibbāna_ 'nirvana' properly and the Tai Khuen may be
unhappy with how they have to write _dhamma_ 'dharma' and its compounds
in Pali.

*I know it's after because of the 'shaping' in the Da Lekh font, which
eliminates the vast bulk of the dotted circles misinserted by USE,
whose specification is wrong.

> So this may just be a bug in the implementation of the USE that
> you’re using. I see this bug in Safari (CoreText), but not in Firefox
> (Harfbuzz); haven’t tried Edge. Which one are you using?

MS Edge (see above).  The dotted circle behaviour of HarfBuzz and MS
Edge is different - I have dotted circle lookups in my font dedicated to
HarfBuzz patterns that don't occur in MS Edge.  I haven't checked my
font to destruction yet (6 marks will generally overwhelm it); I've
just thrown two Northern Thai dictionaries at it.

Richard.



More information about the Unicode mailing list