NNBSP (was: A last missing link for interoperable representation)

Marcel Schneider via Unicode unicode at unicode.org
Thu Jan 17 07:57:23 CST 2019


On 17/01/2019 14:36, I wrote:
> […]
> The only thing that searches have brought up

It was actually the best thing. Here’s an even more surprising hit:

                B. In the rules, allow these characters to bridge both alphabetic and numeric words, with:

                  * Replace MidLetter by (MidLetter | MidNumLet)
                  * Replace MidNum by (MidNum | MidNumLet)


                -------------------------

                4. In addition, the following are also sometimes used, or could be used, as numeric separators (we don't give much guidance as to the best choice in the standard):

                |0020 <http://unicode.org/cldr/utility/character.jsp?a=0020>|( ) SPACE
                |00A0 <http://unicode.org/cldr/utility/character.jsp?a=00A0>|(   ) NO-BREAK SPACE
                |2007 <http://unicode.org/cldr/utility/character.jsp?a=2007>|(   ) FIGURE SPACE
                |2008 <http://unicode.org/cldr/utility/character.jsp?a=2008>|(   ) PUNCTUATION SPACE
                |2009 <http://unicode.org/cldr/utility/character.jsp?a=2009>|(   ) THIN SPACE
                |202F <http://unicode.org/cldr/utility/character.jsp?a=202F>|(   ) NARROW NO-BREAK SPACE

                If we had good reason to believe that if one of these only really occurred between digits in a single number, then we could add it. I don't have enough information to feel like a proposal for that is warranted, but others may. Short of that, we should at least document in the notes that some implementations may want to tailor MidNum to add some of these.


I fail to understand what hack is going on. Why didn’t Unicode wish to sort out which one of these is the group separator?

 1. SPACE: is breakable, hence exit.
 2. NO-BREAK SPACE: is justifying, hence exit.
 3. FIGURE SPACE: has the full width of a digit, too wide, hence exit.
 4. PUNCTUATION SPACE: has been left breakable against all reason and evidence and consistency, hence exit…
 5. THIN SPACE: is part of the breakable spaces series, hence exit.
 6. NARROW NO-BREAK SPACE: is okay.

CLDR has been OK to fix this for French for release 34. At present survey 35 all is questioned again, must be assessed, may impact implementations, while all other locales using space are still impacted by bad display using NO-BREAK SPACE.

I know we have another public Mail List for that, but I feel it’s important to submit this to a larger community for consideration and eventually, for feedback.

Thanks.

Regards,

Marcel

P.S. For completeness:

http://unicode.org/L2/L2007/07370-punct.html

And also wrt my previous post:

https://www.unicode.org/L2/L2007/07209-whistler-uax14.txt







-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20190117/de9a6f2a/attachment-0001.html>


More information about the Unicode mailing list