Line-breaking algorithm: Unexpected break in multiple consecutive numeric prefixes

Ophir Lifshitz me at ophir.li
Fri Apr 1 09:39:31 CDT 2022


Hello again,

I hope it's not an issue to re-ask this question I had from a while back.

Thanks!

On Sun, Sep 19, 2021 at 5:13 AM Ophir Lifshitz <me at ophir.li> wrote:

> I have a question about the line-breaking algorithm. Apologies if it
> is uninformed or if this is the wrong venue.
>
> I recently experienced an unexpected line break[1] after the first
> character in the following sequence[2]:
>
> ‎− 2212 MINUS SIGN  (line-breaking class PR)
> ‎$ 0024 DOLLAR SIGN (line-breaking class PR)
> ‎4 0034 DIGIT FOUR  (line-breaking class NU)
> ‎5 0035 DIGIT FIVE  (line-breaking class NU)
>
> (However, if the first character is replaced by 002B PLUS SIGN (also
> class PR), a line break does not occur.)
>
> I also noticed that there is no "PR × PR" rule in (e.g.) LB25.
>
> Is this intended, perhaps an oversight, or is it up to implementation
> discretion i.e. "tailored"?
>
> If it is an oversight, what is the process for correcting it or filing
> a bug? It is hard to find that information on the Unicode website.
>
> Thank you.
>
>
> [1] The line break appeared in Chrome 93 and Safari 13.1 on Mac 10.13,
> but not in Firefox 85.
> I tested by navigating in my browser to the following data URIs:
>
> data:text/html;charset=utf-8,<p%20style="width:1px;">%E2%88%92$45</p>
> data:text/html;charset=utf-8,<p%20style="width:1px;">%2B$45</p>
>
> [2] This sequence is intended to behave as a single unit (word), and
> refers to a price discount in the original text.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://corp.unicode.org/pipermail/unicode/attachments/20220401/66061bef/attachment.htm>


More information about the Unicode mailing list