Question about a Normalization test

Whistler, Ken ken.whistler at sap.com
Thu Oct 23 13:15:00 CDT 2014


Aaron Cannon asked:



> Hi all, from the latest version of the standard, on line 16977 of the

> normalization tests, I am a bit confused by the NFC form.  It appears

> incorrect to me.  Here's the line, sans comment:

>

> 0061 0305 0315 0300 05AE 0062;0061 05AE 0305 0300 0315 0062;0061 05AE

> 0305 0300 0315 0062;0061 05AE 0305 0300 0315 0062;0061 05AE 0305 0300

> 0315 0062;

>

> Just looking at column 2, which according to the comments at the top

> is the NFC form:

>

> 0061 05AE 0305 0300 0315 0062:

>

> This, however, does not appear to be in NFC form.

>

> The first character, and the second or third characters do not

> compose.  However, the first and fourth (0061  and 0300) do, composing

> to 00E0.

>

> Since there are no further compositions, the normalized form should be

> 00E0 05AE 0305 0315 0062

>

> What am I missing?

>



Input is:



Code points: 0061 0305 0315 0300 05AE 0062

Ccc:            0  230  232  230  228    0



Output of canonical reordering is:



Code points: 0061 05AE 0305 0300 0315 0062

Ccc:            0  228  230  230  232    0



Next step is to start from 0061 and test each successive combining

mark, looking for composition candidates.



0061 does not compose with 05AE.

0061 does not compose with 0305.

0061 *could* compose with 0300 (00E0 = 0061 + 0300), *but*

0300 is *blocked* from 0061 by the intervening combining

mark 0305 with the *same* ccc value as 0300. So the

composition does not occur.

0061 does not compose with 0315.

The next character is 0062, ccc=0, a starter, so we are done.



For the relevant definitions, see:



http://www.unicode.org/versions/Unicode7.0.0/ch03.pdf#G50628



and scroll down a couple pages to D115 on p. 139.



Test cases like this are included in NormalizationTest.txt precisely

to ensure that implementations are correctly detecting these

sequences where composition is blocked.



--Ken


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20141023/3f3231e0/attachment.html>


More information about the Unicode mailing list