UCA unnecessary collation weight 0000

Philippe Verdy via Unicode unicode at unicode.org
Thu Nov 1 15:31:15 CDT 2018


Le jeu. 1 nov. 2018 à 21:08, Markus Scherer <markus.icu at gmail.com> a
écrit :

> When you want fast string comparison, the zero weights are useful for
>> processing -- and you don't actually assemble a sort key.
>>
>
And no, I absolutely no case where any 0000 weight is useful during
processing, it does not distinguish any case, even for "fast" string
comparison.

Even if you don't build any sort key, may be you'll want to return 0000 it
you query the weight for a specific collatable element, but this would be
the same as querying if the collatable element is ignorable or not for a
given specific level; this query just returns a false or true boolean, like
this method of a Collator object:

  bool isIgnorable(int level, string collatable element)

and you can also make this reliable for any collector:

  int getLevel(int weight);
  int getMinWeight(int level);
  int getWeightAt(string element, int level, int position);

so you can use these two last functions to write the first one:

  bool isIgnorable(int level, string element) {
    return getLevel(getWeightAt(element, 0)) > getMinWeight(level);
  }

That's enough you can write the fast comparison...

What I said is not a complicate "compression" this is done on the fly,
without any complex transform. All that counts is that any primary weight
value is higher than any secondary weight, and any secondary weight is
higher than a tertiary weight.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20181101/fda57bd7/attachment.html>


More information about the Unicode mailing list