Word break question
Richard Wordingham via CLDR-Users
cldr-users at unicode.org
Sat Apr 29 20:12:36 CDT 2017
On Sat, 29 Apr 2017 17:42:03 -0700
Cameron Dutro via CLDR-Users <cldr-users at unicode.org> wrote:
> Hey CLDR users,
>
> I have a question regarding the word break rules from CLDR v31.
> Consider the following word break test:
>
> ÷ 0001 × 0308 ÷ 0041 ÷
>
> I believe rule #5 should apply between 0308 and 0041, which looks
> like this:
>
> $AHLetter × $AHLetter
>
> 0308 has a word break property of "Extend" which $AHLetter matches,
> and 0041 has a word break property of ALetter which $AHLetter also
> matches. The thing is, rule #5 indicates no break should occur
> between these characters. Furthermore, there are only two rules in
> which a break is indicated (3.1 and 3.2), both of which don't apply
> in this case. What am I missing?
You're missing the shape of the brackets in "<variable
id="$ALetter">($ALetter $FEZ*)</variable>". The brackets are round,
not square, so <U+0308> does not match $ALetter as it is not a string
starting with something for which Word_Break=ALetter. Obviously
<U+0001, U+0308> does not match either.
Secondly, if you read
http://unicode.org/reports/tr35/tr35-general.html#Segmentations, you
will see that the final rule of "Any ÷ Any" is implicit.
Richard.
More information about the CLDR-Users
mailing list