Collation / Fractional UCA / Implicit Weights Questions

Kip Cole via CLDR-Users cldr-users at unicode.org
Sat Nov 25 19:07:35 CST 2017


As part of my efforts to implement CLDR support for the Elixir language I’ve now started work on collations and working my way through TR10 and the relevant parts of TR35.  I have some questions on implicit weight calculation I’m unable to resolve and would appreciate any help or pointers on:

(1) Unified Ideograph vs Radical

Is there a preferred or intended strategy - to use the Unified Ideograph or radical definitions?

(2) Calculating implicit weights for radical definitions

TR10/TR35 seem quiet on the topic - my working assumption is to use the [fixed first implicit byte E0] and [fixed last implicit byte E4] in FractionalUCA.txt to generate implicit weights that respect the radical order (left to right, top to bottom).  Is that a reasonable working principle?

(3) Implicit weight calculations in general

TR10 at https://www.unicode.org/reports/tr10/#Implicit_Weights <https://www.unicode.org/reports/tr10/#Implicit_Weights> will generate weights with a top byte of 0xFB which would seem in conflict with the [fixed first implicit byte E0] and [fixed last implicit byte E4] indicators.  My working assumption is to use the algorithm in TR10 to calculate implicit weights except for radical definitions which would use the [fixed first] and [fixed last]

This would seem to align with TR35 which says:

"Note: The particular primary lead bytes for Hani vs. IMPLICIT vs. TRAILING are only an example” suggesting that Hani is calculated with leading bytes 0xFB per TR10 and the [fixed first implicit] can be used to generate weights for radicals (and other non specified code points)
Thanks in advance, —Kip 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/cldr-users/attachments/20171126/42296031/attachment.html>


More information about the CLDR-Users mailing list