Collation / Fractional UCA / Implicit Weights Questions

Markus Scherer via CLDR-Users cldr-users at unicode.org
Sun Nov 26 23:50:06 CST 2017


On Sun, Nov 26, 2017 at 6:02 PM, Kip Cole <kipcole9 at gmail.com> wrote:

> So now I understand better about the application of the radical data and I
> need to decide where to place them. You note: "For ICU, I move the
> implicit-weight lead bytes much higher, to make more room for large Han
> tailorings. You can choose your implicit-weight allocation freely”
>
> Where do you place them? (I know, I should read the code and I will but
> the learning curve is steep!)
>

I have a piece of code in the ICU "genuca" tool (not one of the installed
ICU tools) that takes the number of Han characters for which we need
implicit primaries (from one of the early lines in FractionalUCA.txt) and
calculates the number of lead bytes for 3-byte weights with a certain gap
size (for tailoring between Han characters). Given the current gap size, it
uses three lead bytes FB..FD. FE is for 4-byte unassigned-implicit
primaries, and FF is for "trailing weights" where there are currently only
a couple including for U+FFFD and U+FFFF.
See https://sites.google.com/site/icusite/design/collation/bytes

These may move in the future when there are more Han characters, we decide
on a different gap size, leave more room for trailing weights, etc.

The primary lead bytes from somewhere near 80 to currently FA are used for
large CJK tailorings, so that we get a decent number of two-byte weights.

markus
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/cldr-users/attachments/20171126/1ce32712/attachment.html>


More information about the CLDR-Users mailing list