WORD JOINER vs ZWNBSP

Marcel Schneider charupdate at orange.fr
Fri Jun 26 05:48:39 CDT 2015


Iʼve got a problem with the word joiner and would ask anybody if things could be changed please. After two examples, Iʼll draw the issue.

To do traditional French typography on the PC, a justifying no-break space is needed along with the colon, because this punctuation must be placed in the middle between the word it belongs to and the following word. According to the Standard, page 799 (§ 23.2), such a space is obtained by bracketing a white space with word joiners: U+2060 U+0020 U+2060. To make this colon readily available on keyboard, I should therefore program the sequence:
{VK_OEM_2 /*T34 B09*/ ,3 ,0x2060 ,' ' ,0x2060 ,':' ,NONE ,NONE ,NONE ,NONE ,NONE ,NONE ,NONE ,NONE ,NONE ,NONE ,NONE ,NONE }

Still in French, the letter apostrophe, when used as current apostrophe, prevents the following word from being identified as a word because of the missing word boundary and, subsequently, prevents the autoexpand from working. This can be fixed by adding a word joiner after the apostrophe, thanks to an autocorrect entry that replaces U+02BC inserted by default in typographic mode, with U+02BC U+2060. (About why to use U+02BC, even in French, please refer to the preceding thread ‘A new take on the English Apostrophe in Unicode’. Iʼll just add now that without disambiguating apostrophes and close-quotes, any search for quotations, e.g. to mark them up, using the generic character * bracketed like ‘*’, must fail because results are cut at the next apostrophe instead of extending to the closing-quote.)

However, despite of the word joiner having been encoded and recommended since version 3.2 of the Standard, it is still not implemented on Windows 7. Therefore I must use the traditional zero width no-break space U+FEFF instead. 

In TUS, sections 23.2 (page 799) and 23.8 (pages 821 sqq), we are taught that for the semantics of word joining, U+2060 is strongly preferred, but U+FEFF must still be supported for backward compatibility. As well, it results from § 23.8 that in careful text processing, U+FEFF always occurs only at the very beginning of text files when used as a byte order mark (page 822), while applications where Unicode has been carefully implemented, are expected to always mention the charset and the transformation format the files are written in, and donʼt need U+FEFF as a BOM. Therefore, it seems that U+FEFF can still be used as a ZWNBSP in *new* text files, despite of its use being strongly discouraged and U+2060 being preferred.

Supposing that Microsoft choose not to implement U+2060 WJ because quitting the usage of U+FEFF ZWNBSP appeared needless and would have brought much trouble for no use (or at least, not much), please permit me to ask if Unicode couldnʼt follow Microsoft once again and remove the recommendation of U+2060 please. Most people just *canʼt* use this character, and keyboard implementations *must* avoid it.


Best regards,
Marcel Schneider
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20150626/8871d79b/attachment.html>


More information about the Unicode mailing list