WORD JOINER vs ZWNBSP

Marcel Schneider charupdate at orange.fr
Sat Jun 27 10:48:41 CDT 2015


On Fri, Jun 26, Richard Wordingham  wrote:

> On Fri, 26 Jun 2015 12:48:39 +0200 (CEST) Marcel Schneider  wrote:
>> To do traditional French typography on the PC, a justifying no-break
>> space is needed along with the colon, because this punctuation must
>> be placed in the middle between the word it belongs to and the
>> following word. According to the Standard, page 799 (§ 23.2), such a
>> space is obtained by bracketing a white space with word joiners:
>> U+2060 U+0020 U+2060. To make this colon readily available on
>> keyboard, I should therefore program the sequence: {VK_OEM_2 /*T34
>> B09*/ ,3 ,0x2060 ,'
>> ' ,0x2060 ,':' ,NONE ,NONE ,NONE ,NONE ,NONE ,NONE ,NONE ,NONE ,NONE ,NONE ,NONE ,NONE }
> For readability, I strongly recommend 0x0020 over ' ' in this context.

I pasted the line from the C source, where all ASCII characters, including 0x20, are written in clear. To ensure readibility, I inserted a line break before this line. This line break must have been deleted. I don't write 0x0020 in C when it's not necessary. However I take notice of your recommendation.

> What is the behavioural difference between  and U+00A0?

The difference appears in word processing, where justification works with U+0020, while all other spaces, including U+00A0, are not justified.

> However, if you reread the section, you will see that the sequence they have in mind is .

The section I cited reads as follows:
 
| The word joiner can be used to prevent line breaking with other characters that do not have nonbreaking variants, such as U+2009 thin space or U+2015 horizontal bar, by bracketing the character.
 
I don't believe that U+2009 is a target character rather than a mere example. IMHO you can bracket with U+2060s whatever character you need. 

>> Still in French, the letter apostrophe, when used as current
>> apostrophe, prevents the following word from being identified as a
>> word because of the missing word boundary and, subsequently, prevents
>> the autoexpand from working. This can be fixed by adding a word
>> joiner after the apostrophe, thanks to an autocorrect entry that
>> replaces U+02BC inserted by default in typographic mode, with U+02BC
>> U+2060.
>No, this doesn't work. While the primary purpose of U+2060 is to prevent line breaks, it is also used to overrule word boundary detectors in scriptio continua. (It works quite well for spell-checking Thai in LibreOffice). It's name implies to me that it is
intended to prevent a word boundary being deduced, through the strong correlation between word boundaries and line break opportunities. There doesn't seem to be a code for 'zero-width word boundary at which lines should not normally be broken'.
 
Well, I extrapolated from U+FEFF, which works fine for me, even in this particular context. The fact that U+2060 does not work, is another reason not to use it, and the more I agree with Microsoft which did not implement U+2060 in Windows 7. Do you have any news about whether U+2060 is a part of at least one font on Windows 8?
 
Marcel Schneider
 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20150627/f1abc95e/attachment.html>


More information about the Unicode mailing list