Re: ◌ in LB28a in UAX14 of Unicode 15.1.0
Markus Scherer
markus.icu at gmail.com
Mon Sep 4 12:53:01 CDT 2023
On Mon, Sep 4, 2023 at 6:36 AM Daniel Bünzli via Unicode <
unicode at corp.unicode.org> wrote:
> Also it would be nicer for certain implementations if that was somehow
> integrated as a character class in the rules like e.g. ZJW is.
>
It didn't seem worth it for a one-off, especially now that we no longer
partition the code space with exactly one property value per code point.
is there a machine readable version of the rules for all the Unicode
> segmentation standards ?
>
There is not an official version like that.
Unofficially, we have such a version in the tools code that generates the
test data:
https://github.com/unicode-org/unicodetools/blob/main/unicodetools/src/main/resources/org/unicode/tools/SegmenterDefault.txt
for the UAX #14/#29 default behavior
https://github.com/unicode-org/unicodetools/blob/main/unicodetools/src/main/resources/org/unicode/tools/SegmenterCldr.txt
for CLDR/ICU root locale tailorings, if any
markus
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://corp.unicode.org/pipermail/unicode/attachments/20230904/cbea746d/attachment.htm>
More information about the Unicode
mailing list