NBSP supposed to stretch, right?

Shriramana Sharma via Unicode unicode at unicode.org
Mon Dec 16 18:50:39 CST 2019


Hello. I've just tested LibreOffice, Google Docs and MS Office on
Linux, Android and Windows, and it seems that NBSP doesn't get
stretched like the normal space character when justified alignment
requires it.

Let me explain. I'm creating a document with the following text
typeset in 12 pt Lohit Tamil with justified alignment on an A5 page
with 0.5" margin all around:

ஶ்ரீமத் மஹாபாரதம் என்பது நமது தேசத்தின் பெரும் இதிஹாஸமாகும். இதனை
இயற்றியவர் ஶ்ரீ வேத வ்யாஸர். அவரால் அனுக்ரஹிக்கப்பட்டவையான நூல்கள் பல.

The screenshot https://sites.google.com/site/jamadagni/files/temp/nbsp-not-expanding.png
may be useful to illustrate the situation. Readers may try such
similar sentences in any software/platform of their choice and report
as to what happens.

Here the problem arises with the phrase ஶ்ரீ வேத வ்யாஸர். The word
ஶ்ரீ is a honorific applying to the following name of the sage வேத
வ்யாஸர், so it would seem unsightly to the reader if it goes to the
previous line, so I insert an NBSP between it and the name. (Isn't
there such a stylistic convention in English where Mr doesn't stand at
the end of a line? I don't know.)

However, the phrase is shortly followed by a long word
அனுக்ரஹிக்கப்பட்டவையான, which is too long to fit on the same line and
hence goes to the next line, thereby increasing the inter-word spacing
on its previous line significantly. But the NBSP after the honorific
doesn't stretch, making the word layout unsightly.

IIUC, no-break space is just that: a space that doesn't permit a line
break. This says nothing about it being fixed width.

Unicode 12.0 §2.3 on p 27 (55 of PDF) says:

“Other compatibility decomposable characters are widely used
characters serving essential functions. U+00A0 no-break space is one
example. In these and similar cases, such as fixed-width space
characters,….”

To my understanding this itself says that NBSP isn't fixed-width.

ibid §6.2 on p 265 (293 of PDF) specifically talking about spacing
characters says:

“No-Break Space. U+00A0 no-break space (NBSP) is the nonbreaking counterpart of
U+0020 space. It has the same width, but behaves differently for line
breaking. For more information, see Unicode Standard Annex #14,
“Unicode Line Breaking Algorithm.”

The wording “but behaves differently for line breaking” seems to
vindicate what I understood that the only difference is in line
breaking behaviour but the wording “has the same width” doesn't
clearly say anything about the stretching behaviour, only about the
nominal advance width given as part of font data.

I would have gone and filed this as a LibreOffice bug since that's the
software I use most, but when I found this is a cross-software
problem, I thought it would be best to have this discussed and
documented here (and in a future version of the standard).

My expectation is that since NBSP is not intended to be a fixed width
space, and the only difference intended between it and the normal
U+0020 SP being in line breaking, NBSP should be treated equal to
U+0020 for the purpose of stretching for justified alignment.

Only then can text such as the above be naturally easily formatted.

-- 
Shriramana Sharma ஶ்ரீரமணஶர்மா श्रीरमणशर्मा ������������������������



More information about the Unicode mailing list