NNBSP and Word Boundaries
richard.wordingham at ntlworld.com
Thu Oct 1 11:26:33 CDT 2015
The background document for PRI #308 (Property Change for NNBSP),
http://www.unicode.org/review/pri308/pri308-background.html , says,
"The only other widely noted use for U+202F NNBSP is for representation
of the thin non-breaking space (espace fine insécable) regularly seen
next to certain punctuation marks in French style typography. However,
the word segmentation change for U+202F should have no impact in that
context, as ExtendNumLet is explicitly for preventing breaks between
letters, but does not prevent the identification of word boundaries
next to punctuation marks."
Unfortunately, this isn't quite true. In the text fragment "
dit<NNBSP>: ", there would be internal word-boundaries before 'd' and
before and after ':', but the word isolated would be the four characters
"dit<NNBSP>". One solution would be replace NNBSP by U+2009 THIN
SPACE, for with untailored line-breaking there would be no line break
between it and the 't' or colon, but there would be a word break
between the 't' and the thin space.
The problem is that characters with property ExtendNumLet can be the
first or last character of a word as well as a character strictly
within a word. In this respect, the property differs from characters
with the property MidNumLet. The problem with using that property
instead is that such characters, such as FULL STOP, may be flanked by
letters or numbers within a word, but not both. The problem then
arises with the Mongolian analogue of '4th' etc. - it is written digit,
NNBSP, letters, and is a single word.
More information about the Unicode