UCA question / Produce Collation Element Arrays

Sat Dec 2 05:52:43 CST 2017

Markus, probably another dumb question but I’m making progress.  In section 7.2 or TR10 the algorithm for producing a CE array says:

S2.1 Find the longest initial substring S at each point that has a match in the collation element table.

S2.1.1 If there are any non-starters following S, process each non-starter C.

S2.1.2 If C is an unblocked non-starter with respect to S, find if S + C has a match in the collation element table.

Note: This condition is specific to non-starters, and is not precisely the same as the concept of blocking in normalization, since it is dealing with look ahead for a discontiguous match, rather than with normalization forms. Hangul jamos and other starters are only supported with contiguous matches .

S2.1.3 If there is a match, replace S by S + C, and remove C. 

For s2.1.1 I’m trying to confirm what “process each non-starter C” means.  Best I understand so far it means “ignore” or “skip” all C that are non-starters.  is that the correct interpretation? 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/cldr-users/attachments/20171202/307e3e63/attachment-0001.html>