Unicode of Death 2.0

Manish Goregaokar via Unicode unicode at unicode.org
Sun Feb 18 02:01:53 CST 2018

Oh, also vatu.

Seems like that ordering algorithm is indeed relevant.


On Sat, Feb 17, 2018 at 11:57 PM, Manish Goregaokar <manish at mozilla.com>

> Ah, looking at that the OpenType `pstf` feature seems relevant, though I
> cannot get it to crash with Gurmukhi (where the consonant ya is a postform)
> -Manish
> On Sat, Feb 17, 2018 at 4:40 PM, Philippe Verdy <verdy_p at wanadoo.fr>
> wrote:
>> An interesting read:
>> https://docs.microsoft.com/fr-fr/typography/script-developme
>> nt/bengali#reor
>> 2018-02-18 1:30 GMT+01:00 Philippe Verdy <verdy_p at wanadoo.fr>:
>>> My opinion about this bug is that Apple's text renderer dynamically
>>> allocates a glyphs buffer only when needed (lazily), but a test is missing
>>> for the lazy construction of this buffer (which is not needed for most
>>> texts not needing glyph substitutions or reordering when a single accessor
>>> from the code point can find the glyph data directly by lookup in font
>>> tables) and this is causing a null pointer exception at run time.
>>> The bug occurs effectively when processing the vowel that occurs after
>>> the ZWNJ, if the code assumes that there's a glyphs buffer already
>>> constructed for the cluster, in order to place the vowel over the correct
>>> glyph (which may have been reordered in that buffer).
>>> Microsoft's text renderer, or other engines use do not delay the
>>> constructiuon of the glyphs buffer, which can be reused for processing the
>>> rest of the text, provided it is correctly reset after processing a cluster.
>>> 2018-02-17 21:54 GMT+01:00 Manish Goregaokar <manish at mozilla.com>:
>>>> Heh, I wasn't aware of the word "phala-form", though that seems
>>>> Bengali-specific?
>>>> Interesting observation about the vowel glyphs, I'll mention this in
>>>> the post. Initially I missed this because I hadn't realized that the
>>>> bengali o vowel crashed (which made me discount this).
>>>> Thanks!
>>>> -Manish
>>>> On Sat, Feb 17, 2018 at 12:22 PM, Philippe Verdy <verdy_p at wanadoo.fr>
>>>> wrote:
>>>>> I would have liked that your invented term of "left-joining
>>>>> consonants" took the usual name "phala forms" (to represent RA or JA/JO
>>>>> after a virama, generally named "raphala" or "japhala/jophala").
>>>>> And why this bug does not occur with some vowels is because these are
>>>>> vowels in two parts, that are first decomposed into two separate glyphs
>>>>> reordered in the buffer of glyphs, while other vowels do not need this
>>>>> prior mapping and keep their initial direct mapping from their codepoints
>>>>> in fonts, which means that this has to do to the way the ZWNJ looks for the
>>>>> glyphs of the vowels in the glyphs buffer and not in the initial codepoints
>>>>> buffer: there's some desynchronization, and more probably an uninitialized
>>>>> data field (for the lookup made in handling ZWNJ) if no vowel decomposition
>>>>> was done (the same data field is correctly initialized when it is the first
>>>>> consonnant which takes an alternate form before a virama, like in most
>>>>> Indic consonnant clusters, because the a glyph buffer is created.
>>>>> Now we have some hints about why the bug does not occur in Kannada or
>>>>> Khmer: a glyph buffer is always created, but there was some shortcut made
>>>>> in  Devanagari, Bengali, and Telugu to allow processing clusters faster
>>>>> without having to create always a gyphs buffer (to allow reordering glyphs
>>>>> before positioning them), and working directly on the codepoints streams.
>>>>> So it seems related to the fact that OpenType fonts do not need to
>>>>> include rules for glyph substitution, but the PHALA forms are represented
>>>>> without any glyph substitution, by mapping directly the phala forms in a
>>>>> separate table for the consonants. Because there's been no code to glyph
>>>>> subtitution, the glyph buffer is not created, but then when processing the
>>>>> ZWNJ, it looks for data in a glyph buffer that has still not be initialized
>>>>> (and this is specific to the renderers implemented by Apple in iOS and
>>>>> MacOS). This bug does not occur if another text rendering engine is used
>>>>> (e.g. in non-Apple web browsers).
>>>>> 2018-02-16 19:44 GMT+01:00 Manish Goregaokar <manish at mozilla.com>:
>>>>>> FWIW I dissected the crashing strings, it's basically all <consonant,
>>>>>> virama, consonant, zwnj, vowel> sequences in Telugu, Bengali, Devanagari
>>>>>> where the consonant is suffix-joining (ra in Devanagari, jo and ro in
>>>>>> Bengali, and all Telugu consonants), the vowel is not Bengali au or o /
>>>>>> Telugu ai, and if the second consonant is ra/ro the first one is not also
>>>>>> ra/ro (or ro-with-line-through-it).
>>>>>> https://manishearth.github.io/blog/2018/02/15/picking-apart-
>>>>>> the-crashing-ios-string/
>>>>>> -Manish
>>>>>> On Thu, Feb 15, 2018 at 10:58 AM, Philippe Verdy via Unicode <
>>>>>> unicode at unicode.org> wrote:
>>>>>>> That's probably not a bug of Unicode but of MacOS/iOS text renderers
>>>>>>> with some fonts using advanced composition feature.
>>>>>>> Similar bugs could as well the new advanced features added in
>>>>>>> Windows or Android to support multicolored emojis, variable fonts,
>>>>>>> contextual glyph transforms, style variants, or more font formats (not just
>>>>>>> OpenType); the bug may also be in the graphic renderer (incorrect clipping
>>>>>>> when drawing the glyph into the glyph cache, with buffer overflows possibly
>>>>>>> caused by incorrectly computed splines), and it could be in the display
>>>>>>> driver (or in the hardware accelerator having some limitations on the
>>>>>>> compelxity of multipolygons to fill and to antialias), causing some
>>>>>>> infinite recursion loop, or too deep recursion exhausting the stack limit;
>>>>>>> Finally the bug could be in the OpenType hinting engine moving some
>>>>>>> points outside the clipping area (the math theory may say that such
>>>>>>> plcement of a point outside the clipping area may be impossible, but
>>>>>>> various mathematical simplifcations and shortcuts are used to simplify or
>>>>>>> accelerate the rendering, at the price of some quirks. Even the SVG
>>>>>>> standard (in constant evolution) could be affected as well in its
>>>>>>> implementation.
>>>>>>> There are tons of possible bugs here.
>>>>>>> 2018-02-15 18:21 GMT+01:00 James Kass via Unicode <
>>>>>>> unicode at unicode.org>:
>>>>>>>> This article:
>>>>>>>> https://techcrunch.com/2018/02/15/iphone-text-bomb-ios-mac-c
>>>>>>>> rash-apple/?ncid=mobilenavtrend
>>>>>>>> The single Unicode symbol referred to in the article results from a
>>>>>>>> string of Telugu characters.  The article doesn't list or display
>>>>>>>> the
>>>>>>>> characters, so Mac users can visit the above link.  A link in one of
>>>>>>>> the comments leads to a page which does display the characters.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20180218/0ae1bf09/attachment.html>

More information about the Unicode mailing list