Unicode of Death 2.0
Philippe Verdy via Unicode
unicode at unicode.org
Sun Feb 18 06:04:16 CST 2018
Yes, I found other possible crashes all caused by the glyph reordering. It
seems really that Apple implemented some unsafe shortcuts by not creating a
glyphs buffer in all cases (using lasy instanciation only when needed), but
forgot some cases and the code assumes that the glyphs buffer has been
initialized and then it probably does through a null pointer exception or
similar
2018-02-18 9:01 GMT+01:00 Manish Goregaokar <manish at mozilla.com>:
> Oh, also vatu.
>
> Seems like that ordering algorithm is indeed relevant.
>
> -Manish
>
> On Sat, Feb 17, 2018 at 11:57 PM, Manish Goregaokar <manish at mozilla.com>
> wrote:
>
>> Ah, looking at that the OpenType `pstf` feature seems relevant, though I
>> cannot get it to crash with Gurmukhi (where the consonant ya is a postform)
>>
>> -Manish
>>
>> On Sat, Feb 17, 2018 at 4:40 PM, Philippe Verdy <verdy_p at wanadoo.fr>
>> wrote:
>>
>>> An interesting read:
>>>
>>> https://docs.microsoft.com/fr-fr/typography/script-developme
>>> nt/bengali#reor
>>>
>>>
>>> 2018-02-18 1:30 GMT+01:00 Philippe Verdy <verdy_p at wanadoo.fr>:
>>>
>>>> My opinion about this bug is that Apple's text renderer dynamically
>>>> allocates a glyphs buffer only when needed (lazily), but a test is missing
>>>> for the lazy construction of this buffer (which is not needed for most
>>>> texts not needing glyph substitutions or reordering when a single accessor
>>>> from the code point can find the glyph data directly by lookup in font
>>>> tables) and this is causing a null pointer exception at run time.
>>>>
>>>> The bug occurs effectively when processing the vowel that occurs after
>>>> the ZWNJ, if the code assumes that there's a glyphs buffer already
>>>> constructed for the cluster, in order to place the vowel over the correct
>>>> glyph (which may have been reordered in that buffer).
>>>>
>>>> Microsoft's text renderer, or other engines use do not delay the
>>>> constructiuon of the glyphs buffer, which can be reused for processing the
>>>> rest of the text, provided it is correctly reset after processing a cluster.
>>>>
>>>>
>>>> 2018-02-17 21:54 GMT+01:00 Manish Goregaokar <manish at mozilla.com>:
>>>>
>>>>> Heh, I wasn't aware of the word "phala-form", though that seems
>>>>> Bengali-specific?
>>>>>
>>>>> Interesting observation about the vowel glyphs, I'll mention this in
>>>>> the post. Initially I missed this because I hadn't realized that the
>>>>> bengali o vowel crashed (which made me discount this).
>>>>>
>>>>>
>>>>> Thanks!
>>>>>
>>>>> -Manish
>>>>>
>>>>> On Sat, Feb 17, 2018 at 12:22 PM, Philippe Verdy <verdy_p at wanadoo.fr>
>>>>> wrote:
>>>>>
>>>>>> I would have liked that your invented term of "left-joining
>>>>>> consonants" took the usual name "phala forms" (to represent RA or JA/JO
>>>>>> after a virama, generally named "raphala" or "japhala/jophala").
>>>>>>
>>>>>> And why this bug does not occur with some vowels is because these are
>>>>>> vowels in two parts, that are first decomposed into two separate glyphs
>>>>>> reordered in the buffer of glyphs, while other vowels do not need this
>>>>>> prior mapping and keep their initial direct mapping from their codepoints
>>>>>> in fonts, which means that this has to do to the way the ZWNJ looks for the
>>>>>> glyphs of the vowels in the glyphs buffer and not in the initial codepoints
>>>>>> buffer: there's some desynchronization, and more probably an uninitialized
>>>>>> data field (for the lookup made in handling ZWNJ) if no vowel decomposition
>>>>>> was done (the same data field is correctly initialized when it is the first
>>>>>> consonnant which takes an alternate form before a virama, like in most
>>>>>> Indic consonnant clusters, because the a glyph buffer is created.
>>>>>>
>>>>>> Now we have some hints about why the bug does not occur in Kannada or
>>>>>> Khmer: a glyph buffer is always created, but there was some shortcut made
>>>>>> in Devanagari, Bengali, and Telugu to allow processing clusters faster
>>>>>> without having to create always a gyphs buffer (to allow reordering glyphs
>>>>>> before positioning them), and working directly on the codepoints streams.
>>>>>>
>>>>>> So it seems related to the fact that OpenType fonts do not need to
>>>>>> include rules for glyph substitution, but the PHALA forms are represented
>>>>>> without any glyph substitution, by mapping directly the phala forms in a
>>>>>> separate table for the consonants. Because there's been no code to glyph
>>>>>> subtitution, the glyph buffer is not created, but then when processing the
>>>>>> ZWNJ, it looks for data in a glyph buffer that has still not be initialized
>>>>>> (and this is specific to the renderers implemented by Apple in iOS and
>>>>>> MacOS). This bug does not occur if another text rendering engine is used
>>>>>> (e.g. in non-Apple web browsers).
>>>>>>
>>>>>>
>>>>>> 2018-02-16 19:44 GMT+01:00 Manish Goregaokar <manish at mozilla.com>:
>>>>>>
>>>>>>> FWIW I dissected the crashing strings, it's basically all
>>>>>>> <consonant, virama, consonant, zwnj, vowel> sequences in Telugu, Bengali,
>>>>>>> Devanagari where the consonant is suffix-joining (ra in Devanagari, jo and
>>>>>>> ro in Bengali, and all Telugu consonants), the vowel is not Bengali au or o
>>>>>>> / Telugu ai, and if the second consonant is ra/ro the first one is not also
>>>>>>> ra/ro (or ro-with-line-through-it).
>>>>>>>
>>>>>>> https://manishearth.github.io/blog/2018/02/15/picking-apart-
>>>>>>> the-crashing-ios-string/
>>>>>>>
>>>>>>> -Manish
>>>>>>>
>>>>>>> On Thu, Feb 15, 2018 at 10:58 AM, Philippe Verdy via Unicode <
>>>>>>> unicode at unicode.org> wrote:
>>>>>>>
>>>>>>>> That's probably not a bug of Unicode but of MacOS/iOS text
>>>>>>>> renderers with some fonts using advanced composition feature.
>>>>>>>>
>>>>>>>> Similar bugs could as well the new advanced features added in
>>>>>>>> Windows or Android to support multicolored emojis, variable fonts,
>>>>>>>> contextual glyph transforms, style variants, or more font formats (not just
>>>>>>>> OpenType); the bug may also be in the graphic renderer (incorrect clipping
>>>>>>>> when drawing the glyph into the glyph cache, with buffer overflows possibly
>>>>>>>> caused by incorrectly computed splines), and it could be in the display
>>>>>>>> driver (or in the hardware accelerator having some limitations on the
>>>>>>>> compelxity of multipolygons to fill and to antialias), causing some
>>>>>>>> infinite recursion loop, or too deep recursion exhausting the stack limit;
>>>>>>>>
>>>>>>>> Finally the bug could be in the OpenType hinting engine moving some
>>>>>>>> points outside the clipping area (the math theory may say that such
>>>>>>>> plcement of a point outside the clipping area may be impossible, but
>>>>>>>> various mathematical simplifcations and shortcuts are used to simplify or
>>>>>>>> accelerate the rendering, at the price of some quirks. Even the SVG
>>>>>>>> standard (in constant evolution) could be affected as well in its
>>>>>>>> implementation.
>>>>>>>>
>>>>>>>> There are tons of possible bugs here.
>>>>>>>>
>>>>>>>> 2018-02-15 18:21 GMT+01:00 James Kass via Unicode <
>>>>>>>> unicode at unicode.org>:
>>>>>>>>
>>>>>>>>> This article:
>>>>>>>>> https://techcrunch.com/2018/02/15/iphone-text-bomb-ios-mac-c
>>>>>>>>> rash-apple/?ncid=mobilenavtrend
>>>>>>>>>
>>>>>>>>> The single Unicode symbol referred to in the article results from a
>>>>>>>>> string of Telugu characters. The article doesn't list or display
>>>>>>>>> the
>>>>>>>>> characters, so Mac users can visit the above link. A link in one
>>>>>>>>> of
>>>>>>>>> the comments leads to a page which does display the characters.
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20180218/8e735cea/attachment.html>
More information about the Unicode
mailing list