A last missing link for interoperable representation

Marcel Schneider via Unicode unicode at unicode.org
Tue Jan 15 05:24:44 CST 2019

On 15/01/2019 10:24, Philippe Verdy via Unicode wrote:
> Le lun. 14 janv. 2019 à 20:25, Marcel Schneider via Unicode <unicode at unicode.org <mailto:unicode at unicode.org>> a écrit :
>     On 14/01/2019 06:08, James Kass via Unicode wrote:
>     >
>     > Marcel Schneider wrote,
>     >
>     >> There is a crazy typeface out there, misleadingly called 'Courier
>     >> New', as if the foundry didn’t anticipate that at some point it
>     >> would be better called "Courier Obsolete". ...
>     >
>     > ������ �������������� seems a bit ��������<i>é</i> nowadays, as well.
>     >
>     > (Had to use mark-up for that “span” of a single letter in order to
>     > indicate the proper letter form.  But the plain-text display looks
>     > crazy with that HTML jive in it.)
>     >
>     I apologize for seeming to question the font name ������ ���� while targeting only
>     the fact that this typeface is not updated to support the <NNBSP>. It just
>     looks like the grand name is now misused to make people believe that if
>     **this** great font is unsupporting <NNBSP>, it has a good reason to do so,
>     and we should keep people off using that “exotic whitespace” otherwise than
>     “intended,” ie for Mongolian. Since fortunately TUS started backing its use
>     in French (2014)
> This is not for Mongolian and French wanted this space since long and it has a use even in English since centuries for fine typography.
> So no, NNBSP is definitely NOT "exotic whitespace". It's just that it was forgotten in the early stages of computing with legacy 8-bit encodings but it should have been in Unicode since the begining as its existence is proven long before the computing age (before ASCII, or even before Baudot and telegraphic systems). It has alsway been used by typographs, it has centuries of tradition in publishing. And it has always been recommended and still today for French for all books/papers publishers.
Many thanks for bringing this to the point. So the case is even worse as Unicode deliberately skipped the non-breakable thin space while thinking at encoding the whole range of other typographic spaces, even with duplicate encoding of en and em spaces, and not forgetting those old-fashioned tabular spaces and dash: figure space and dash, and punctuation space. In this particular context and with all that historic practice background, what else than malice (supposedly inspired by an unlawful and exuberant DTP vendor) could drive people not to define the line-breaking property value of U+2008 PUNCTUATION SPACE as "GL", while they did define it so for U+2007 FIGURE SPACE.

Here is also the still outdated wording of UAX #14 wrt NNBSP, Mongolian and French:

                            […] NARROW NO-BREAK SPACE is used in Mongolian. The MONGOLIAN VOWEL SEPARATOR acts like a NARROW NO-BREAK SPACE in its line breaking behavior. It additionally affects the shaping of certain vowel characters as described in/Section 13.5, Mongolian/, of [Unicode <http://www.unicode.org/reports/tr41/tr41-23.html#Unicode>].

                            NARROW NO-BREAK SPACE is a narrow version of NO-BREAK SPACE, which has exactly the same line breaking behavior, but with a narrow display width. It is regularly used in Mongolian in certain grammatical contexts (before a particle), where it also influences the shaping of the glyphs for the particle. In Mongolian text, the NARROW NO-BREAK SPACE is typically displayed with one third the width of a normal space character.

                            When NARROW NO-BREAK SPACE occurs in French text, it should be interpreted as an “espace fine insécable”.

“When […] it should be interpreted as […]” is a pure insult. NARROW NO-BREAK SPACE *is* exactly at least the French "espace fine insécable" *and* the Mongolian whatever-it-is-called-in-Mongolian *and* the group separator, aka triad separator, in *all* locales following the SI and ISO recommendation to group digits with spaces, not with any punctuation.

As hopefully that misleading section will be edited, here’s the link to the quoted version:

Also I’d like or better I need to kindly ask the knowing List Members to correct the following statement *if* it is wrong:

                If the Unicode Standard had been set up in an unbiased way, U+2008 PUNCTUATION SPACE had been given the line break property value "GL".

Perhaps the following would also be true:

                If the Unicode Standard had been set up in an unbiased way, there would be a NARROW NO-BREAK SPACE encoded in the range U+2000..U+200F.

Thanks in advance to Philippe Verdy and any other knowing List Members for staying or getting in touch and (keeping) posting feedback.

I don’t edit the subject line, nor do I spin off a new thread, given when I lauched this one I sincerely believed that the issues with NARROW NO-BREAK SPACE and with preformatted superscript abbreviation indicators for interoperable representation of French and numerous other languages (part of which are using not only the former as groun separator, but also the latter as ordinal indicators) are about to be definitely settled. Turns out they’re not. Hopefully when this thread goes on, the sometimes extremely aggressive anti-NNBSP lobbying (and also the more lenient anti-preformatted-superscript lobbying) will come to an end, freeing the way to the real Unicode interoperable digital representation of all of the world’s languages.

Best regards,

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20190115/b68c083e/attachment.html>

More information about the Unicode mailing list