NNBSP (was: A last missing link for interoperable representation)

Marcel Schneider via Unicode unicode at unicode.org
Wed Jan 16 21:51:57 CST 2019


On 16/01/2019 21:53, Richard Wordingham via Unicode wrote:
>
> On Tue, 15 Jan 2019 13:25:06 +0100
> Philippe Verdy via Unicode <unicode at unicode.org> wrote:
> 
>> If your fonts behave incorrectly on your system because it does not
>> map any glyph for NNBSP, don't blame the font or Unicode about this
>> problem, blame the renderer (or the application or OS using it, may
>> be they are very outdated and were not aware of these features, theyt
>> are probably based on old versions of Unicode when NNBSP was still
>> not present even if it was requested since very long at least for
>> French and even English, before even Unicode, and long before
>> Mongolian was then encoded, only in Unicode and not in any known
>> supported legacy charset: Mongolian was specified by borrowing the
>> same NNBSP already designed for Latin, because the Mongolian space
>> had no known specific behavior: the encoded whitespaces in Unicode
>> are compeltely script-neutral, they are generic, and are even
>> BiDi-neutral, they are all usable with any script).
> 
> The concept of this codepoint started for Mongolian, but was generalised
> before the character was approved.

Indeed it was proposed as MONGOLIAN SPACE <MSP> at block start, which was
consistent with the need of a MONGOLIAN COMMA, MONGOLIAN FULL STOP and much
more. When Unicode argued in favor of a unification with <NBSP>, this was
pointed as impracticable, and the need of a specific Mongolian space for
the purpose of appending suffixes was underscored. Only in London in
September 1998 it was agreed that “The Mongolian Space is retained but
moved to the general punctuation block and renamed ‘Narrow No Break Space’ ”.

However, unlike for the Mongolian Combination Symbols sequencing a question
and exclamation mark both ways, a concrete rationale as of how useful the
<NNBSP> could be in other scripts doesn’t seem to be put on the table when
the move to General Punctuation was decided.

> 
> Now, I understand that all claims about character properties that cannot
> be captured in the UCD should be dismissed as baseless, but if we
> believed the text of TUS we would find that NNBSP has some interesting
> properties with application only to Mongolian:

As a side-note: The relevant text of TUS doesn’t predate version 11 (2018).

> 
> 1) It has a shaping effect on following character.
> 2) It has zero width at the start of a line.
> 3) When the line-breaking algorithm does not provide enough
> line-breaking opportunities, it changes its line-breaking property
> from GL to BB.

I don’t believe that these additions to TUS are in any way able to fix
the many issues with <NNBSP> in Mongolian causing so much headache and
ending up in a unanimous desire to replace <NNBSP> with a *new*
*MONGOLIAN SUFFIX CONNECTOR. Indeed some suffixes are as long as 7 letters,
e.g. “ ᠲᠠᠶᠢᠭᠠᠨ <U+202F><U+1832><U+1820><U+1836><U+1822><U+182D><U+1820><U+1828>”

https://lists.w3.org/Archives/Public/public-i18n-mongolian/2015JulSep/att-0036/DS05_Mongolian_NNBSP_Connected_Suffixes.pdf

> 
> Or is property (3) appropriate for French?

No it isn’t. It only introduces new flaws for a character that,
despite being encoded for Mongolian with specific handling intended,
was readily ripped off for use in French, Philippe Verdy reported,
to that extent that it is actually an encoding error in Mongolian
that brought the long-missing narrow non-breakable thin space into
the UCS, in the block where it really belongs to, and where it had
been encoded in the beginning if there had been no desire to keep
it proprietary.

That is the hidden (almost occult) fact where stances like “The
NNBSP can be used to represent the narrow space occurring around
punctuation characters in French typography, which is called an
‘espace fine insécable.’ ” (TUS) and “When NARROW NO-BREAK SPACE
occurs in French text, it should be interpreted as an ‘espace fine
insécable’.” (UAX #14) are stemming from. The underlying meaning
as I understand it now is like: “The non-breakable thin space is
usually a vendor-specific layout control in DTP applications; it’s
also available via a TeX command. However, if you are interested
in an interoperable representation, here’s a Unicode character you
can use instead.”

Due to the way <NNBSP> made its delayed way into Unicode, font
support was reported as late as almost exactly two years ago to
be extremely scarce, this analysis of the first 47 fonts on
Windows 10 shows:

https://www.unicode.org/L2/L2017/17036-mongolian-suffix.pdf

Surprisingly for me, Courier New has NNBSP. We must have been
using old copies. I’m really glad that this famous and widely
used typeface has been unpdated. Please disregard my previous
posting about Courier New unsupporting NNBSP. I’ll need to use
a font manager to output a complete list wrt NNBSP support.

I’m utterly worried about the fate of the non-breaking thin
space in Unicode, and I wonder why the French and Canadian
French people present at setup – either on Unicode side or
on JTC1/SC2/WG2 side – didn’t get this character encoded in
the initial rush. Did they really sell themselves and their
locales to DTP lobbyists? Or were they tricked out?

Also, at least one French typographer was extremely upset
about Unicode not gathering feedback from typographers.
That blame is partly wrong since at least one typographer
was and still is present in WG2, and even if not being a
Frenchman (but knowing French), as an Anglophone he might
have been aware of the most outstanding use case of NNBSP
with English (both British and American) quotation marks
when a nested quotation starts or ends a quotation, where
_‘ ”_ or _“ ’_ and _’ ”_ or _” ’_ are preferred over the
unspaced compounds (_‘”_ or _“’_ and _’”_ or _”’_), at
least with proportional fonts. And not to forget the SI-
conformant (and later ISO 80000 conformant) use of a thin
space (non-breakable of course) for the purpose of
grouping digits to triads, both before *and after* the
decimal separator.


Thanks to Richard Wordingham for catching this.

It’s a very good point.

Best regards,

Marcel


More information about the Unicode mailing list