WORD JOINER vs ZWNBSP

Marcel Schneider charupdate at orange.fr
Mon Jul 6 06:36:31 CDT 2015


On Sat, Jul 04, 2015, Richard Wordingham  wrote :

> I will also note that people are reluctant to type
> invisible characters if they don't have immediate benefits.

This might be the reason why U+2060 hadn't been properly implemented on the spot on word processors, whose users were supposed not to use it.  As it has already been pointed out, on my version of Word, U+2060 is font-related, what it should not be, and the fallback isn't well set (nor is is it for U+205D TRICOLON, BTW).  In the meantime, in typography, where the interest of a word joiner is obvious, other software is used.  By contrast, later versions of word processing applications, no matter of which software house, would have experienced in-depth changes including text segmentation tailoring.

> The Thai and Cambodian implementations are far from perfect, even when
> applied to the Thai and Cambodian languages.  Using a dictionary for
> the national languages on text of other languages naturally has even
> worse performance.  A quick experiment suggest that for whole word
> search in Thai, LibreOffice simply ignores any boundaries bwtween Thai
> word characters.  Double click and ctrl/arrow use different rules.

When Doug Ewell wrote on Tue Jun 30, 2015 that clicking on either part of  'one\u2060two' selects the whole, I didn't check on my version, taking that as a matter of fact.  Now I've done and I'm astonished to see *one* part selected only.  Consequently, between Word 97 (the full version on which Word 2010 Starter is based upon, if I remember well what I've read somewhere) and Word 2010, even the rules for double click and ctrl/arrow must have been changed, to better meet users' needs and expectations.  From this and some among the bugs having been fixed prior to Word 2013 (I've been told on Microsoft Community), I extrapolate without hasty generalization that Word 2016 could eventually be the performative version I expect since I do word processing.

> It's quite possible that we are misinterpreting the results of whole
> word searches.  One way of implementing whole word search is to do a
> general search and then check whether the word found is part of a
> larger word.  To do that, one might simply ask whether the
> characters before and after the string found are permitted in words.
> One might easily set things up so that by omission U+2060 is not
> considered part of a word - the code could have been written before
> U+2060 was assigned and not updated since.

Indeed, perhaps we are dealing with an obsolete behavior.  I wonder whether Word 2010, which is already overriding U+2060 at word selecting and quick cursor move, does the same at whole word search.  Personally I'd prefer it did not, because I believe that this isn't useful.  So I agree with OpenOffice/LibreOffice (tested version of the latter: 4.2.4.2), that don't.  Nor does Adobe Reader. By deduction, I'm now supposing that Microsoft Word actually doesn't neither.

Thank you for the information about the Thai and Cambodian implementations.  I think that it would be correct to prioritize updates for those implementations which "are far from perfect", given that those still exist(!), in order that everybody on earth could come into the benefit of really performative worktools.

Marcel
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20150706/41b0e69e/attachment.html>


More information about the Unicode mailing list