Unicode of Death 2.0

Philippe Verdy via Unicode unicode at unicode.org
Sun Feb 18 07:05:42 CST 2018


Now what I suspect in Apple's implementation is the following:

the OpenType specification details the steps to parse strings, find
clusters boundaries, identify the various character types (joining,
associativity, decomposable characters...)

At first Apple parses the clusters and marks those that may require
reordering: it can detect the possibility of existence of reph forms,
before-base consonnants, vowels with multiple components.
If this condition is true, then it goes to a "slow path" to use the complex
algorithm requiring the preparation of a glyphs buffer. Otherwise it uses a
"fast path" and can work directly at the code points level.

Here the bug is manifested by the behavior of ZWNJ + vowel, because this
code assumes it runs only in the "slow path" (where a glyphs buffer has
been prepared), but we are here in a case for the "fast path" determined
only by conditions set by the clusters parsing.

The "glyphs buffer" may also still be prepared lazily in case of
application of complex GSUB (i.e. not 1-to-1 mappings) in some of the
features (I don't think that Apple has a bug here, this still allows
switching dynamically from the "fast path" to the "slow path" on demand,
depending on features implemented in fonts.

But any operation in OpenType that requires reordering requires a glyphs
buffer. This could even apply to Latin if Microsoft really intends to
support normalization (i.e. canonical equivalences) in its own USE engine
(for now it does not) because it would also require a glyphs buffer to
allow correct reordering of glyphs (according to their properties, notably
for "beforebase", or for special placement of some diacritics such as the
cedilla that moves from "belowbase" to "abovebase" when the base is the
letter "g").

Unfortunately, the OpenType specifications are not very clear and it is
still a mess to read. In addition, it has been repeatedly moved on
Microsoft website (broken URLs all the time): this specification hosted by
Microsoft should better be on a separate stable website and not necessarily
linked to Microsoft. These repeated moves (and content conversions when
Microsoft decides to change the site layout for its own online "developers
network" center) is a problem: the conversion has once again broken a part
of the documentation (see missing images for illustrations or for showing
some glyphs...)

If OpenType is supposed to be interoperable, Microsoft should make it more
stable outside MSDN (GitHub suggested, Microsoft already moves there many
of its open sourced or cooperative projects, and GutHub still allows
integration from Microsoft's website, including for commenting the
Microsoft documentation for Windows or Office or XBox apps which is now on
GitHub, and GiutHub still permits Microsoft to link back to its own website
with tools on the sidebar without breaking the local content in GitHub
projects). This move would also allow cleaner versioning than what is on
MSDN.

--- side comment:
In fact, even the Windows/Office/XBox public developers documentation could
also be transited to GitHub (MSDN is completely broken now, when it mixes
all versions, with too many "Page not found" errors found everywhere, it is
extremely difficult to make stable references to the doc in all development
projects for Windows, when it changes at each major Windows release, or
when a new version is in preparation: MSDN only focus the msot recent
version and documentation for older versions are completely forgotten and
too frequently broken: this is also a problem for support sites for many
third party developers, but as well within Microsoft's own solution centers
and forums, the solutions are hard to evaluate, unstable)... Microsoft
still does not want to honor a strong recommendation made by the W3C and
the IETF: URLs must be stable (and Microsoft's idea of using its own GUIDs
or article IDs to reference the contents via an indirection is not a
solution, because Microsoft frequently forget to maintain the targets of
these redirects when it is moved "elsewhere").



2018-02-18 13:04 GMT+01:00 Philippe Verdy <verdy_p at wanadoo.fr>:

> Yes, I found other possible crashes all caused by the glyph reordering. It
> seems really that Apple implemented some unsafe shortcuts by not creating a
> glyphs buffer in all cases (using lasy instanciation only when needed), but
> forgot some cases and the code assumes that the glyphs buffer has been
> initialized and then it probably does through a null pointer exception or
> similar
>
> 2018-02-18 9:01 GMT+01:00 Manish Goregaokar <manish at mozilla.com>:
>
>> Oh, also vatu.
>>
>> Seems like that ordering algorithm is indeed relevant.
>>
>> -Manish
>>
>> On Sat, Feb 17, 2018 at 11:57 PM, Manish Goregaokar <manish at mozilla.com>
>> wrote:
>>
>>> Ah, looking at that the OpenType `pstf` feature seems relevant, though I
>>> cannot get it to crash with Gurmukhi (where the consonant ya is a postform)
>>>
>>> -Manish
>>>
>>> On Sat, Feb 17, 2018 at 4:40 PM, Philippe Verdy <verdy_p at wanadoo.fr>
>>> wrote:
>>>
>>>> An interesting read:
>>>>
>>>> https://docs.microsoft.com/fr-fr/typography/script-developme
>>>> nt/bengali#reor
>>>>
>>>>
>>>> 2018-02-18 1:30 GMT+01:00 Philippe Verdy <verdy_p at wanadoo.fr>:
>>>>
>>>>> My opinion about this bug is that Apple's text renderer dynamically
>>>>> allocates a glyphs buffer only when needed (lazily), but a test is missing
>>>>> for the lazy construction of this buffer (which is not needed for most
>>>>> texts not needing glyph substitutions or reordering when a single accessor
>>>>> from the code point can find the glyph data directly by lookup in font
>>>>> tables) and this is causing a null pointer exception at run time.
>>>>>
>>>>> The bug occurs effectively when processing the vowel that occurs after
>>>>> the ZWNJ, if the code assumes that there's a glyphs buffer already
>>>>> constructed for the cluster, in order to place the vowel over the correct
>>>>> glyph (which may have been reordered in that buffer).
>>>>>
>>>>> Microsoft's text renderer, or other engines use do not delay the
>>>>> constructiuon of the glyphs buffer, which can be reused for processing the
>>>>> rest of the text, provided it is correctly reset after processing a cluster.
>>>>>
>>>>>
>>>>> 2018-02-17 21:54 GMT+01:00 Manish Goregaokar <manish at mozilla.com>:
>>>>>
>>>>>> Heh, I wasn't aware of the word "phala-form", though that seems
>>>>>> Bengali-specific?
>>>>>>
>>>>>> Interesting observation about the vowel glyphs, I'll mention this in
>>>>>> the post. Initially I missed this because I hadn't realized that the
>>>>>> bengali o vowel crashed (which made me discount this).
>>>>>>
>>>>>>
>>>>>> Thanks!
>>>>>>
>>>>>> -Manish
>>>>>>
>>>>>> On Sat, Feb 17, 2018 at 12:22 PM, Philippe Verdy <verdy_p at wanadoo.fr>
>>>>>> wrote:
>>>>>>
>>>>>>> I would have liked that your invented term of "left-joining
>>>>>>> consonants" took the usual name "phala forms" (to represent RA or JA/JO
>>>>>>> after a virama, generally named "raphala" or "japhala/jophala").
>>>>>>>
>>>>>>> And why this bug does not occur with some vowels is because these
>>>>>>> are vowels in two parts, that are first decomposed into two separate glyphs
>>>>>>> reordered in the buffer of glyphs, while other vowels do not need this
>>>>>>> prior mapping and keep their initial direct mapping from their codepoints
>>>>>>> in fonts, which means that this has to do to the way the ZWNJ looks for the
>>>>>>> glyphs of the vowels in the glyphs buffer and not in the initial codepoints
>>>>>>> buffer: there's some desynchronization, and more probably an uninitialized
>>>>>>> data field (for the lookup made in handling ZWNJ) if no vowel decomposition
>>>>>>> was done (the same data field is correctly initialized when it is the first
>>>>>>> consonnant which takes an alternate form before a virama, like in most
>>>>>>> Indic consonnant clusters, because the a glyph buffer is created.
>>>>>>>
>>>>>>> Now we have some hints about why the bug does not occur in Kannada
>>>>>>> or Khmer: a glyph buffer is always created, but there was some shortcut
>>>>>>> made in  Devanagari, Bengali, and Telugu to allow processing clusters
>>>>>>> faster without having to create always a gyphs buffer (to allow reordering
>>>>>>> glyphs before positioning them), and working directly on the codepoints
>>>>>>> streams.
>>>>>>>
>>>>>>> So it seems related to the fact that OpenType fonts do not need to
>>>>>>> include rules for glyph substitution, but the PHALA forms are represented
>>>>>>> without any glyph substitution, by mapping directly the phala forms in a
>>>>>>> separate table for the consonants. Because there's been no code to glyph
>>>>>>> subtitution, the glyph buffer is not created, but then when processing the
>>>>>>> ZWNJ, it looks for data in a glyph buffer that has still not be initialized
>>>>>>> (and this is specific to the renderers implemented by Apple in iOS and
>>>>>>> MacOS). This bug does not occur if another text rendering engine is used
>>>>>>> (e.g. in non-Apple web browsers).
>>>>>>>
>>>>>>>
>>>>>>> 2018-02-16 19:44 GMT+01:00 Manish Goregaokar <manish at mozilla.com>:
>>>>>>>
>>>>>>>> FWIW I dissected the crashing strings, it's basically all
>>>>>>>> <consonant, virama, consonant, zwnj, vowel> sequences in Telugu, Bengali,
>>>>>>>> Devanagari where the consonant is suffix-joining (ra in Devanagari, jo and
>>>>>>>> ro in Bengali, and all Telugu consonants), the vowel is not Bengali au or o
>>>>>>>> / Telugu ai, and if the second consonant is ra/ro the first one is not also
>>>>>>>> ra/ro (or ro-with-line-through-it).
>>>>>>>>
>>>>>>>> https://manishearth.github.io/blog/2018/02/15/picking-apart-
>>>>>>>> the-crashing-ios-string/
>>>>>>>>
>>>>>>>> -Manish
>>>>>>>>
>>>>>>>> On Thu, Feb 15, 2018 at 10:58 AM, Philippe Verdy via Unicode <
>>>>>>>> unicode at unicode.org> wrote:
>>>>>>>>
>>>>>>>>> That's probably not a bug of Unicode but of MacOS/iOS text
>>>>>>>>> renderers with some fonts using advanced composition feature.
>>>>>>>>>
>>>>>>>>> Similar bugs could as well the new advanced features added in
>>>>>>>>> Windows or Android to support multicolored emojis, variable fonts,
>>>>>>>>> contextual glyph transforms, style variants, or more font formats (not just
>>>>>>>>> OpenType); the bug may also be in the graphic renderer (incorrect clipping
>>>>>>>>> when drawing the glyph into the glyph cache, with buffer overflows possibly
>>>>>>>>> caused by incorrectly computed splines), and it could be in the display
>>>>>>>>> driver (or in the hardware accelerator having some limitations on the
>>>>>>>>> compelxity of multipolygons to fill and to antialias), causing some
>>>>>>>>> infinite recursion loop, or too deep recursion exhausting the stack limit;
>>>>>>>>>
>>>>>>>>> Finally the bug could be in the OpenType hinting engine moving
>>>>>>>>> some points outside the clipping area (the math theory may say that such
>>>>>>>>> plcement of a point outside the clipping area may be impossible, but
>>>>>>>>> various mathematical simplifcations and shortcuts are used to simplify or
>>>>>>>>> accelerate the rendering, at the price of some quirks. Even the SVG
>>>>>>>>> standard (in constant evolution) could be affected as well in its
>>>>>>>>> implementation.
>>>>>>>>>
>>>>>>>>> There are tons of possible bugs here.
>>>>>>>>>
>>>>>>>>> 2018-02-15 18:21 GMT+01:00 James Kass via Unicode <
>>>>>>>>> unicode at unicode.org>:
>>>>>>>>>
>>>>>>>>>> This article:
>>>>>>>>>> https://techcrunch.com/2018/02/15/iphone-text-bomb-ios-mac-c
>>>>>>>>>> rash-apple/?ncid=mobilenavtrend
>>>>>>>>>>
>>>>>>>>>> The single Unicode symbol referred to in the article results from
>>>>>>>>>> a
>>>>>>>>>> string of Telugu characters.  The article doesn't list or display
>>>>>>>>>> the
>>>>>>>>>> characters, so Mac users can visit the above link.  A link in one
>>>>>>>>>> of
>>>>>>>>>> the comments leads to a page which does display the characters.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20180218/fb0a4993/attachment-0001.html>


More information about the Unicode mailing list