Normalization test
Philippe Verdy
verdy_p at wanadoo.fr
Mon Mar 10 14:28:57 CDT 2014
toNFC(0061 0305 0315 0300 05AE 0062) ->
>From DerivedCombiningClass.txt<http://www.unicode.org/Public/UCD/latest/ucd/extracted/DerivedCombiningClass.txt>:
05D0..05EA ; 0 # Lo [27] HEBREW LETTER ALEF..HEBREW LETTER TAV
In other words, 05EA with combining class 0 is blocking the
composition and any reordering between
(0061 0305 0315 0300) on one side, and
(0062) on the other side (which is also combining class 0).
So you will effectively get the composition of 0061 and 0305 (because
it is also no specifically excluded from composition in
CompositionExclusions.txt
<http://www.unicode.org/Public/UCD/latest/ucd/CompositionExclusions.txt>)
in:
toNFC(0061 0305 0315 0300 05AE 0062),
but NOT in:
toNFC(0061 05AE 0305 0315 0300 0062).
I think you have mixed the two separate test cases.
The first thing to check is to break sequences before every character with
combining class 0 (even if it is "combining", like here the Hebrew accent
zinor).
2014-03-10 19:34 GMT+01:00 Markus Doppelbauer <doppelbauer at gmx.net>:
> Hello,
>
> I am working on an Unicode Normalization implemenation. I have a question
> about a specific toNFC test rule.
>
> toNFC(0061 0305 0315 0300 05AE 0062) =>
> (0061 05AE 0305 0300 0315 0062)
> expected:
> (0061 05AE 0305 0300 0315 0062)
> \-------------/ =>
> (00E0 05AE 0305 0315 0062)
>
> Why doesn't 0061 and 0300 combine to 00E0 ?
>
> Thanks a lot
> Markus
>
>
> _______________________________________________
> Unicode mailing list
> Unicode at unicode.org
> http://unicode.org/mailman/listinfo/unicode
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20140310/a6731c58/attachment.html>
More information about the Unicode
mailing list