Word break question
Richard Wordingham via CLDR-Users
cldr-users at unicode.org
Sun Apr 30 16:01:49 CDT 2017
On Sun, 30 Apr 2017 12:28:51 -0700
Cameron Dutro via CLDR-Users <cldr-users at unicode.org> wrote:
> Richard, thanks again for clarifying the notation in use by the
> segmentation rules - I now understand the left- and right-hand sides
> to be regular expressions. It's still not clear to me how to interpret
> parentheses *inside* character classes however. Consider the following
> generalized case:
>
> [(abc d*)]
You should not be getting parentheses inside character classes.
Ignoring Hebrew letters as a distraction, the rule is
$ALetter × $ALetter
and $ALetter has the value
(\p{Word_Break=ALetter} $FEZ*)
This is a regular expression; it is not defined by a single character
class (or Unicode set).
At each point for which no break decision has been made, there shall
be no break if a string immediately before and a string immediately
after match that pattern.
Richard.
More information about the CLDR-Users
mailing list