Unicode of Death 2.0

Philippe Verdy via Unicode unicode at unicode.org
Sat Feb 17 14:22:55 CST 2018

I would have liked that your invented term of "left-joining consonants"
took the usual name "phala forms" (to represent RA or JA/JO after a virama,
generally named "raphala" or "japhala/jophala").

And why this bug does not occur with some vowels is because these are
vowels in two parts, that are first decomposed into two separate glyphs
reordered in the buffer of glyphs, while other vowels do not need this
prior mapping and keep their initial direct mapping from their codepoints
in fonts, which means that this has to do to the way the ZWNJ looks for the
glyphs of the vowels in the glyphs buffer and not in the initial codepoints
buffer: there's some desynchronization, and more probably an uninitialized
data field (for the lookup made in handling ZWNJ) if no vowel decomposition
was done (the same data field is correctly initialized when it is the first
consonnant which takes an alternate form before a virama, like in most
Indic consonnant clusters, because the a glyph buffer is created.

Now we have some hints about why the bug does not occur in Kannada or
Khmer: a glyph buffer is always created, but there was some shortcut made
in  Devanagari, Bengali, and Telugu to allow processing clusters faster
without having to create always a gyphs buffer (to allow reordering glyphs
before positioning them), and working directly on the codepoints streams.

So it seems related to the fact that OpenType fonts do not need to include
rules for glyph substitution, but the PHALA forms are represented without
any glyph substitution, by mapping directly the phala forms in a separate
table for the consonants. Because there's been no code to glyph
subtitution, the glyph buffer is not created, but then when processing the
ZWNJ, it looks for data in a glyph buffer that has still not be initialized
(and this is specific to the renderers implemented by Apple in iOS and
MacOS). This bug does not occur if another text rendering engine is used
(e.g. in non-Apple web browsers).

2018-02-16 19:44 GMT+01:00 Manish Goregaokar <manish at mozilla.com>:

> FWIW I dissected the crashing strings, it's basically all <consonant,
> virama, consonant, zwnj, vowel> sequences in Telugu, Bengali, Devanagari
> where the consonant is suffix-joining (ra in Devanagari, jo and ro in
> Bengali, and all Telugu consonants), the vowel is not Bengali au or o /
> Telugu ai, and if the second consonant is ra/ro the first one is not also
> ra/ro (or ro-with-line-through-it).
> https://manishearth.github.io/blog/2018/02/15/picking-apart-
> the-crashing-ios-string/
> -Manish
> On Thu, Feb 15, 2018 at 10:58 AM, Philippe Verdy via Unicode <
> unicode at unicode.org> wrote:
>> That's probably not a bug of Unicode but of MacOS/iOS text renderers with
>> some fonts using advanced composition feature.
>> Similar bugs could as well the new advanced features added in Windows or
>> Android to support multicolored emojis, variable fonts, contextual glyph
>> transforms, style variants, or more font formats (not just OpenType); the
>> bug may also be in the graphic renderer (incorrect clipping when drawing
>> the glyph into the glyph cache, with buffer overflows possibly caused by
>> incorrectly computed splines), and it could be in the display driver (or in
>> the hardware accelerator having some limitations on the compelxity of
>> multipolygons to fill and to antialias), causing some infinite recursion
>> loop, or too deep recursion exhausting the stack limit;
>> Finally the bug could be in the OpenType hinting engine moving some
>> points outside the clipping area (the math theory may say that such
>> plcement of a point outside the clipping area may be impossible, but
>> various mathematical simplifcations and shortcuts are used to simplify or
>> accelerate the rendering, at the price of some quirks. Even the SVG
>> standard (in constant evolution) could be affected as well in its
>> implementation.
>> There are tons of possible bugs here.
>> 2018-02-15 18:21 GMT+01:00 James Kass via Unicode <unicode at unicode.org>:
>>> This article:
>>> https://techcrunch.com/2018/02/15/iphone-text-bomb-ios-mac-c
>>> rash-apple/?ncid=mobilenavtrend
>>> The single Unicode symbol referred to in the article results from a
>>> string of Telugu characters.  The article doesn't list or display the
>>> characters, so Mac users can visit the above link.  A link in one of
>>> the comments leads to a page which does display the characters.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20180217/5be7c445/attachment.html>

More information about the Unicode mailing list