UCA unnecessary collation weight 0000

Richard Wordingham via Unicode unicode at unicode.org
Thu Nov 1 16:47:40 CDT 2018


On Thu, 1 Nov 2018 18:39:16 +0100
Philippe Verdy via Unicode <unicode at unicode.org> wrote:

> What this means is that we can safely implement UCA using basic
> substitions (e.g. with a function like "string:gsub(map)" in Lua
> which uses a "map" to map source (binary) strings or regexps,into
> target (binary) strings:
> 
> For a level-3 collation, you just then need only 3 calls to
> "string:gsub()" to compute any collation:
> 
> - the first ":gsub(mapNormalize)" can decompose a source text into
> collation elements and can perform reordering to enforce a normalized
> order (possibly tuned for the tailored locale) using basic regexps.

Are you sure of this?  Will you publish the algorithm?  Have you
passed the official conformance tests?  (Mind you, DUCET is a
relatively easy UCA collation to implement successfully.)

> - the second ":gsub(mapSecondary)"  will substitute any collection
> elements by their "intermediary" collation elements+tertiary weight.
> 
> - the third ":gsub(mapSecondary)" will substitute any "intermediary"
> collation element by their primary weight + secondary weight

Richard.


More information about the Unicode mailing list