Unicode of Death 2.0

Philippe Verdy via Unicode unicode at unicode.org
Sat Feb 17 18:40:26 CST 2018

An interesting read:


2018-02-18 1:30 GMT+01:00 Philippe Verdy <verdy_p at wanadoo.fr>:

> My opinion about this bug is that Apple's text renderer dynamically
> allocates a glyphs buffer only when needed (lazily), but a test is missing
> for the lazy construction of this buffer (which is not needed for most
> texts not needing glyph substitutions or reordering when a single accessor
> from the code point can find the glyph data directly by lookup in font
> tables) and this is causing a null pointer exception at run time.
> The bug occurs effectively when processing the vowel that occurs after the
> ZWNJ, if the code assumes that there's a glyphs buffer already constructed
> for the cluster, in order to place the vowel over the correct glyph (which
> may have been reordered in that buffer).
> Microsoft's text renderer, or other engines use do not delay the
> constructiuon of the glyphs buffer, which can be reused for processing the
> rest of the text, provided it is correctly reset after processing a cluster.
> 2018-02-17 21:54 GMT+01:00 Manish Goregaokar <manish at mozilla.com>:
>> Heh, I wasn't aware of the word "phala-form", though that seems
>> Bengali-specific?
>> Interesting observation about the vowel glyphs, I'll mention this in the
>> post. Initially I missed this because I hadn't realized that the bengali o
>> vowel crashed (which made me discount this).
>> Thanks!
>> -Manish
>> On Sat, Feb 17, 2018 at 12:22 PM, Philippe Verdy <verdy_p at wanadoo.fr>
>> wrote:
>>> I would have liked that your invented term of "left-joining consonants"
>>> took the usual name "phala forms" (to represent RA or JA/JO after a virama,
>>> generally named "raphala" or "japhala/jophala").
>>> And why this bug does not occur with some vowels is because these are
>>> vowels in two parts, that are first decomposed into two separate glyphs
>>> reordered in the buffer of glyphs, while other vowels do not need this
>>> prior mapping and keep their initial direct mapping from their codepoints
>>> in fonts, which means that this has to do to the way the ZWNJ looks for the
>>> glyphs of the vowels in the glyphs buffer and not in the initial codepoints
>>> buffer: there's some desynchronization, and more probably an uninitialized
>>> data field (for the lookup made in handling ZWNJ) if no vowel decomposition
>>> was done (the same data field is correctly initialized when it is the first
>>> consonnant which takes an alternate form before a virama, like in most
>>> Indic consonnant clusters, because the a glyph buffer is created.
>>> Now we have some hints about why the bug does not occur in Kannada or
>>> Khmer: a glyph buffer is always created, but there was some shortcut made
>>> in  Devanagari, Bengali, and Telugu to allow processing clusters faster
>>> without having to create always a gyphs buffer (to allow reordering glyphs
>>> before positioning them), and working directly on the codepoints streams.
>>> So it seems related to the fact that OpenType fonts do not need to
>>> include rules for glyph substitution, but the PHALA forms are represented
>>> without any glyph substitution, by mapping directly the phala forms in a
>>> separate table for the consonants. Because there's been no code to glyph
>>> subtitution, the glyph buffer is not created, but then when processing the
>>> ZWNJ, it looks for data in a glyph buffer that has still not be initialized
>>> (and this is specific to the renderers implemented by Apple in iOS and
>>> MacOS). This bug does not occur if another text rendering engine is used
>>> (e.g. in non-Apple web browsers).
>>> 2018-02-16 19:44 GMT+01:00 Manish Goregaokar <manish at mozilla.com>:
>>>> FWIW I dissected the crashing strings, it's basically all <consonant,
>>>> virama, consonant, zwnj, vowel> sequences in Telugu, Bengali, Devanagari
>>>> where the consonant is suffix-joining (ra in Devanagari, jo and ro in
>>>> Bengali, and all Telugu consonants), the vowel is not Bengali au or o /
>>>> Telugu ai, and if the second consonant is ra/ro the first one is not also
>>>> ra/ro (or ro-with-line-through-it).
>>>> https://manishearth.github.io/blog/2018/02/15/picking-apart-
>>>> the-crashing-ios-string/
>>>> -Manish
>>>> On Thu, Feb 15, 2018 at 10:58 AM, Philippe Verdy via Unicode <
>>>> unicode at unicode.org> wrote:
>>>>> That's probably not a bug of Unicode but of MacOS/iOS text renderers
>>>>> with some fonts using advanced composition feature.
>>>>> Similar bugs could as well the new advanced features added in Windows
>>>>> or Android to support multicolored emojis, variable fonts, contextual glyph
>>>>> transforms, style variants, or more font formats (not just OpenType); the
>>>>> bug may also be in the graphic renderer (incorrect clipping when drawing
>>>>> the glyph into the glyph cache, with buffer overflows possibly caused by
>>>>> incorrectly computed splines), and it could be in the display driver (or in
>>>>> the hardware accelerator having some limitations on the compelxity of
>>>>> multipolygons to fill and to antialias), causing some infinite recursion
>>>>> loop, or too deep recursion exhausting the stack limit;
>>>>> Finally the bug could be in the OpenType hinting engine moving some
>>>>> points outside the clipping area (the math theory may say that such
>>>>> plcement of a point outside the clipping area may be impossible, but
>>>>> various mathematical simplifcations and shortcuts are used to simplify or
>>>>> accelerate the rendering, at the price of some quirks. Even the SVG
>>>>> standard (in constant evolution) could be affected as well in its
>>>>> implementation.
>>>>> There are tons of possible bugs here.
>>>>> 2018-02-15 18:21 GMT+01:00 James Kass via Unicode <unicode at unicode.org
>>>>> >:
>>>>>> This article:
>>>>>> https://techcrunch.com/2018/02/15/iphone-text-bomb-ios-mac-c
>>>>>> rash-apple/?ncid=mobilenavtrend
>>>>>> The single Unicode symbol referred to in the article results from a
>>>>>> string of Telugu characters.  The article doesn't list or display the
>>>>>> characters, so Mac users can visit the above link.  A link in one of
>>>>>> the comments leads to a page which does display the characters.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20180218/32e0d4f6/attachment.html>

More information about the Unicode mailing list