implicit weight base for U+2CEA2

Ken Whistler via Unicode unicode at unicode.org
Wed Sep 27 16:29:38 CDT 2017



On 9/27/2017 2:19 PM, Markus Scherer via Unicode wrote:
> On Wed, Sep 27, 2017 at 1:49 PM, James Tauber via Unicode 
> <unicode at unicode.org <mailto:unicode at unicode.org>> wrote:
>
>     I recently updated pyuca[1], my pure Python implementation of the
>     Unicode Collation Algorithm to work with 8.0.0, 9.0.0, and 10.0.0
>     but to get all the tests to work, I had to special case the
>     implicit weight base for U+2CEA2. The spec seems to suggest the
>     base should be FB80 but I had to override just that code point to
>     have a base of FBC0 for the tests to pass.
>
>     Is this a known issue with the spec or something I've missed?
>
>
> 2CEA2..2CEAF are unassigned code points for which the UCA+DUCET uses a 
> base of FBC0.
>
> markus

And you may have a range error in Extension E to account for the test 
problem.

The relevant section of CollationTest_SHIFTED_SHORT.txt has tests that 
will pass only if:

2B735 < 2B81E < 2CEA2 < 2EBE1 < 2FFFE
Ext C< Ext D < Ext E < Ext F < non-character

Those are *unassigned* characters just past the assigned ranges but 
still in the blocks in each of those CJK extensions. So if you have a 
range error for assigned characters in Extension E, you'd get a failure 
at that point in the text cases.

--Ken

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20170927/cb96d548/attachment.html>


More information about the Unicode mailing list