WORD JOINER vs ZWNBSP

Richard Wordingham richard.wordingham at ntlworld.com
Fri Jun 26 13:28:34 CDT 2015


On Fri, 26 Jun 2015 12:48:39 +0200 (CEST)
Marcel Schneider <charupdate at orange.fr> wrote:

> To do traditional French typography on the PC, a justifying no-break
> space is needed along with the colon, because this punctuation must
> be placed in the middle between the word it belongs to and the
> following word. According to the Standard, page 799 (§ 23.2), such a
> space is obtained by bracketing a white space with word joiners:
> U+2060 U+0020 U+2060. To make this colon readily available on
> keyboard, I should therefore program the sequence: {VK_OEM_2 /*T34
> B09*/ ,3 ,0x2060 ,'
> ' ,0x2060 ,':' ,NONE ,NONE ,NONE ,NONE ,NONE ,NONE ,NONE ,NONE ,NONE ,NONE ,NONE ,NONE }

For readability, I strongly recommend 0x0020 over ' ' in this context.

What is the behavioural difference between <U+2060, U+0020, U+2060> and
U+00A0?

However, if you reread the section, you will see that the sequence
they have in mind is <U+2060, U+2009, U+2060>.

> Still in French, the letter apostrophe, when used as current
> apostrophe, prevents the following word from being identified as a
> word because of the missing word boundary and, subsequently, prevents
> the autoexpand from working. This can be fixed by adding a word
> joiner after the apostrophe, thanks to an autocorrect entry that
> replaces U+02BC inserted by default in typographic mode, with U+02BC
> U+2060.

No, this doesn't work.  While the primary purpose of U+2060 is to
prevent line breaks, it is also used to overrule word boundary
detectors in scriptio continua.  (It works quite well for
spell-checking Thai in LibreOffice).  It's name implies to me that it is
intended to prevent a word boundary being deduced, through the strong
correlation between word boundaries and line break opportunities.

There doesn't seem to be a code for 'zero-width word boundary at which
lines should not normally be broken'.

Richard.



More information about the Unicode mailing list