NNBSP

Marcel Schneider via Unicode unicode at unicode.org
Fri Jan 18 16:05:21 CST 2019


On 18/01/2019 20:09, Asmus Freytag via Unicode wrote:
>
> Marcel,
>
> about your many detailed *technical* questions about the history of character properties, I am afraid I have no specific recollection.
>
Other List Members are welcome to join in, many of whom are aware of how things happened. My questions are meant to be rather simple. Summing up the premium ones:

 1. Why does UTC ignore the need of a non-breakable thin space?
 2. Why did UTC not declare PUNCTUATION SPACE non-breakable?

A less important information would be how extensively typewriters with proportional advance width were used to write books ready for print.

Another question you do answer below:

> French is not the only language that uses a space to group figures. In fact, I grew up with thousands separators being spaces, but in much of the existing publications or documents there was certainly a full (ordinary) space being used. Not surprisingly, because in those years documents were typewritten and even many books were simply reproduced from typescript.
>
> When it comes to figures, there are two different types of spaces.
>
> One is a space that has the same width a digit and is used in the layout of lists. For example, if you have a leading currency symbol, you may want to have that lined up on the left and leave the digits representing the amounts "ragged". You would fill the intervening spaces with this "lining" space character and everything lines up.
>
That is exactly how I understood hot-metal typesetting of tables. What surprises me is why computerized layout does work the same way instead of using tabulations and appropriate tab stops (left, right, centered, decimal [with all decimal separators lining up vertically).
>
> In lists like that, you can get away with not using a narrow thousands separator, because the overall context of the list indicates which digits belong together and form a number. Having a narrow space may still look nicer, but complicates the space fill between the symbol and the digits.
>
It does not, provided that all numbers have thousands separators, even if filling with spaces. It looks nicer because it’s more legible.
>
> Now for numbers in running text using an ordinary space has multiple drawbacks. It's definitely less readable and, in digital representation, if you use 0020 you don't communicate that this is part of a single number that's best not broken across lines.
>
Right.
>
> The problem Unicode had is that it did not properly understand which of the two types of "numeric" spaces was represented by "figure space". (I remember that we had discussions on that during the early years, but that they were not really resolved and that we moved on to other issues, of which many were demanding attention).
>
You were discussing whether the thousands separator should have the width of a digit or the width of a period? Consistently with many other choices, the solution would have been to encode them both as non-breakable, the more as both were at hand, leaving the choice to the end-user.

Current practice in electronic publishing was to use a non-breakable thin space, Philippe Verdy reports. Did that information come in somehow?

ISO 31-0 was published in 1992, perhaps too late for Unicode. It is normally understood that the thousands separator should not have the width of a digit. The allaged reason is security. Though on a typewriter, as you state, there is scarcely any other option. By that time, all computerized text was fixed width, Philippe Verdy reports. On-screen, I figure out, not in book print
>
> If you want to do the right thing you need:
>
> (1) have a solution that works as intended for ALL language using some form of blank as a thousands separator - solving only the French issue is not enough. We should not do this a language at a time.
>
That is how CLDR works. But as soon as that was set up, I started lobbying for support of all relevant locales at once:

https://unicode.org/cldr/trac/ticket/11423

https://unicode.org/pipermail/cldr-users/2018-September/000842.html

https://unicode.org/pipermail/cldr-users/2018-September/000843.html
and
https://unicode.org/cldr/trac/ticket/11423#comment:2

> Do you have colleagues in Germany and other countries that can confirm whether their practice matches the French usage in all details, or whether there are differences? (Including differently acceptability of fallback renderings...).
>
No I don’t but people may wish to read German Wikipedia:

https://de.wikipedia.org/wiki/Zifferngruppierung#Mit_dem_Tausendertrennzeichen

Shared in ticket #11423:
https://unicode.org/cldr/trac/ticket/11423#comment:15

> (2) have a solution that works for lining figures as well as separators.
>
> (3) have a solution that understands ALL uses of spaces that are narrower than normal space. Once a character exists in Unicode, people will use it on the basis of "closest fit" to make it do (approximately) what they want. Your proposal needs to address any issues that would be caused by reinterpreting a character more narrowly that it has been used. Only by comprehensively identifying ALL uses of comparable spaces in various languages and scripts, you can hope to develop a solution that doesn't simply break all non-French text in favor of supporting French typography.
>
There is no such problem except that NNBSP has never worked properly in Mongolian. It was an encoding error, and that is the reason why to date, all font developers unanimously request the Mongolian Suffix Connector. That leaves the NNBSP for what it is consistently used outside Mongolian: a non-breakable thin space, kind of a belated avatar of what PUNCTUATION SPACE should have been since the beginning.
>
> Perhaps you see why this issue has languished for so long: getting it right is not a simple matter.
>
Still it is as simple as not skipping PUNCTUATION SPACE when FIGURE SPACE was made non-breakable. Now we ended up with a mutated Mongolian Space that does not work properly for Mongolian, but does for French and other Latin script using languages. It would even more if TUS was blunter, urging all foundries to update their whole catalogue soon.

Marcel
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20190118/dcc2ee5d/attachment.html>


More information about the Unicode mailing list