Re: ◌ in LB28a in UAX14 of Unicode 15.1.0

Markus Scherer markus.icu at gmail.com
Mon Sep 4 12:53:01 CDT 2023


On Mon, Sep 4, 2023 at 6:36 AM Daniel Bünzli via Unicode <
unicode at corp.unicode.org> wrote:

> Also it would be nicer for certain implementations if that was somehow
> integrated as a character class in the rules like e.g. ZJW is.
>

It didn't seem worth it for a one-off, especially now that we no longer
partition the code space with exactly one property value per code point.

is there a machine readable version of the rules for all the Unicode
> segmentation standards ?
>

There is not an official version like that.

Unofficially, we have such a version in the tools code that generates the
test data:

https://github.com/unicode-org/unicodetools/blob/main/unicodetools/src/main/resources/org/unicode/tools/SegmenterDefault.txt
for the UAX #14/#29 default behavior

https://github.com/unicode-org/unicodetools/blob/main/unicodetools/src/main/resources/org/unicode/tools/SegmenterCldr.txt
for CLDR/ICU root locale tailorings, if any

markus
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://corp.unicode.org/pipermail/unicode/attachments/20230904/cbea746d/attachment.htm>


More information about the Unicode mailing list