implicit weight base for U+2CEA2

James Tauber via Unicode unicode at unicode.org
Wed Sep 27 18:07:00 CDT 2017


Ah yes, I was just going by membership in the CJK Unified Ideographs
Extension E block, not actual assignment.

So the lack of assignment means it should fail the Unified_Ideograph
membership in http://unicode.org/reports/tr10/#Values_For_Base_Table

Got it! Thanks

James


On Wed, Sep 27, 2017 at 5:29 PM, Ken Whistler via Unicode <
unicode at unicode.org> wrote:

>
>
> On 9/27/2017 2:19 PM, Markus Scherer via Unicode wrote:
>
> On Wed, Sep 27, 2017 at 1:49 PM, James Tauber via Unicode <
> unicode at unicode.org> wrote:
>
>> I recently updated pyuca[1], my pure Python implementation of the Unicode
>> Collation Algorithm to work with 8.0.0, 9.0.0, and 10.0.0 but to get all
>> the tests to work, I had to special case the implicit weight base for
>> U+2CEA2. The spec seems to suggest the base should be FB80 but I had to
>> override just that code point to have a base of FBC0 for the tests to pass.
>>
>> Is this a known issue with the spec or something I've missed?
>>
>
> 2CEA2..2CEAF are unassigned code points for which the UCA+DUCET uses a
> base of FBC0.
>
> markus
>
>
> And you may have a range error in Extension E to account for the test
> problem.
>
> The relevant section of CollationTest_SHIFTED_SHORT.txt has tests that
> will pass only if:
>
> 2B735 < 2B81E < 2CEA2 < 2EBE1 < 2FFFE
> Ext C < Ext D < Ext E < Ext F < non-character
>
> Those are *unassigned* characters just past the assigned ranges but still
> in the blocks in each of those CJK extensions. So if you have a range error
> for assigned characters in Extension E, you'd get a failure at that point
> in the text cases.
>
> --Ken
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20170927/ee69c07d/attachment.html>


More information about the Unicode mailing list