Annoyances from Implementation of Canonical Equivalence

Eli Zaretskii via Unicode unicode at unicode.org
Thu Oct 17 02:42:19 CDT 2019


> Date: Thu, 17 Oct 2019 02:26:35 +0100
> From: Richard Wordingham <richard.wordingham at ntlworld.com>
> Cc: Eli Zaretskii <eliz at gnu.org>
> 
> (c) A search for 'n' finding 'ñ'.
> 
> When it comes to canonical equivalence, one answer to (c) is that as
> soon as one adds the next letter letter, e.g. 'na', the search will no
> longer match 'ñ'.

Sounds arbitrary to me.  How do we know that all the users will want
that?

> (This doesn't apply to diacritic-ignoring folding.)

But the issue _was_ diacritic-ignoring folding.

> That argument doesn't work with the Polish letter 'ń' though, as it can
> be word-final.

It actually doesn't work in general, and one factor is indeed
different languages.  The problem with ñ was raised by
Spanish-speaking users, and only they were very much against folding
in this case.  Users of other languages didn't consider that a
problem, and many considered it a welcome feature.

> In many cases, the answer might be a search by collation graphemes, but
> that has other issues besides language sensitivity.

It is also unworkable, because search has to work in contexts where
the text is not displayed at all, and graphemes only exist at display
time.


More information about the Unicode mailing list