Normalization test

Philippe Verdy verdy_p at wanadoo.fr
Mon Mar 10 14:32:00 CDT 2014


Sorry, I took the wrong line (because I typed 05EA instead of 05AE)

05AE          ; 228 # Mn       HEBREW ACCENT ZINOR


You're right, the combining class 228 does not block the composition.




2014-03-10 20:28 GMT+01:00 Philippe Verdy <verdy_p at wanadoo.fr>:

> toNFC(0061 0305 0315 0300 05AE 0062) ->
>
> From DerivedCombiningClass.txt<http://www.unicode.org/Public/UCD/latest/ucd/extracted/DerivedCombiningClass.txt>:
>
>   05D0..05EA    ; 0 # Lo  [27] HEBREW LETTER ALEF..HEBREW LETTER TAV
>
> In other words, 05EA with combining class 0 is blocking the composition and any reordering between
>
>   (0061 0305 0315 0300) on one side, and
>
>   (0062) on the other side (which is also combining class 0).
>
> So you will effectively get the composition of 0061 and 0305 (because it is also no specifically excluded from composition in CompositionExclusions.txt <http://www.unicode.org/Public/UCD/latest/ucd/CompositionExclusions.txt>) in:
>
>   toNFC(0061 0305 0315 0300 05AE 0062),
>
> but NOT in:
>
>   toNFC(0061 05AE 0305 0315 0300 0062).
>
> I think you have mixed the two separate test cases.
>
>
> The first thing to check is to break sequences before every character with
> combining class 0 (even if it is "combining", like here the Hebrew accent
> zinor).
>
> 2014-03-10 19:34 GMT+01:00 Markus Doppelbauer <doppelbauer at gmx.net>:
>
>>  Hello,
>>
>> I am working on an Unicode Normalization implemenation. I have a question
>> about a specific toNFC test rule.
>>
>>  toNFC(0061 0305 0315 0300 05AE 0062) =>
>>      (0061 05AE 0305 0300 0315 0062)
>> expected:
>>      (0061 05AE 0305 0300 0315 0062)
>>         \-------------/  =>
>>      (00E0 05AE 0305      0315 0062)
>>
>> Why doesn't 0061 and 0300 combine to 00E0 ?
>>
>>  Thanks a lot
>> Markus
>>
>>
>> _______________________________________________
>> Unicode mailing list
>> Unicode at unicode.org
>> http://unicode.org/mailman/listinfo/unicode
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20140310/604fc82b/attachment.html>


More information about the Unicode mailing list