NNBSP and Word Boundaries

Richard Wordingham richard.wordingham at ntlworld.com
Sun Oct 4 17:54:30 CDT 2015

On Fri, 2 Oct 2015 09:25:01 +0200
Mark Davis ☕️ <mark at macchiato.com> wrote:

> We add:
> WB13c Mongolian_Letter × NNBSP
> WB13d NNBSP × Mongolian_Letter
> *If* we want to also change behavior on the other side of the NNBSP,
> whenever the Mongolian_Letter and NNBSP occur in sequence, we add 2
> additional rules (with the appropriate values for ..., like Numeric)
> WB13c Mongolian_Letter NNBSP   (...)
> WB13d (...) × NNBSP Mongolian_Letter

I'll assume the last two are meant to be WB13e and WB13f.

We can achieve the effects down to the first WB13d simply by changing
NNBSP from XX to MidNumLet.  This would also provide a proper "espace
fine" for French use within numbers
( https://www.druide.com/enquetes/pour-des-espaces-ins%C3%A9cables-impeccables
) to separate groups of 3 digits.  This needs *no* extra rules.

Now for combined numbers and letters, we might consider adding the two

WB12a Numeric MidNumLet × AHLetter
WB12b Numeric × MidNumLet AHLetter

I think we should go the whole hog, and instead have

WB12c (Numeric|AHLetter) MidNumLetQ × (Numeric|AHLetter)
WB12d (Numeric|AHLetter) × MidNumLetQ (Numeric|AHLetter)

Perhaps there are good reasons against them - I'm not aware of any.  (I
don't think it is wrong to treat "no.2" as a single word.)  These rules
would make the abbreviated names of a good many Thai forms (e.g. คร.๒, a
marriage certificate) into a single word.

WB12c and WB12d overlap with WB6, WB7, WB11 and WB12, which could be
slightly simplified. 


More information about the Unicode mailing list