Line-breaking algorithm: Unexpected break in multiple consecutive numeric prefixes

Ophir Lifshitz me at ophir.li
Sun Sep 19 04:13:20 CDT 2021


I have a question about the line-breaking algorithm. Apologies if it
is uninformed or if this is the wrong venue.

I recently experienced an unexpected line break[1] after the first
character in the following sequence[2]:

‎− 2212 MINUS SIGN  (line-breaking class PR)
‎$ 0024 DOLLAR SIGN (line-breaking class PR)
‎4 0034 DIGIT FOUR  (line-breaking class NU)
‎5 0035 DIGIT FIVE  (line-breaking class NU)

(However, if the first character is replaced by 002B PLUS SIGN (also
class PR), a line break does not occur.)

I also noticed that there is no "PR × PR" rule in (e.g.) LB25.

Is this intended, perhaps an oversight, or is it up to implementation
discretion i.e. "tailored"?

If it is an oversight, what is the process for correcting it or filing
a bug? It is hard to find that information on the Unicode website.

Thank you.


[1] The line break appeared in Chrome 93 and Safari 13.1 on Mac 10.13,
but not in Firefox 85.
I tested by navigating in my browser to the following data URIs:

data:text/html;charset=utf-8,<p%20style="width:1px;">%E2%88%92$45</p>
data:text/html;charset=utf-8,<p%20style="width:1px;">%2B$45</p>

[2] This sequence is intended to behave as a single unit (word), and
refers to a price discount in the original text.



More information about the Unicode mailing list