Janusz S. Bień jsbien at mimuw.edu.pl
Thu Sep 15 14:56:32 CDT 2016

On Thu, Sep 15 2016 at 21:27 CEST, eliz at gnu.org writes:


> Isn't "grapheme cluster" the definition you are looking for?

I don't think so.

On Thu, Sep 15 2016 at 21:27 CEST, leoboiko at namakajiri.net writes:
> Isn't the Swift "character" and the "textel" merely the same thing as
> what Unicode already named "grapheme clusters"? (Well, technically UAX
> #29[1] defines them as "user-perceived characters", but then says
> grapheme clusters approximate user-perceived characters
> algorithmically).
> And, indeed, Swift "Characters" are explicitly defined as "extended
> grapheme clusters" (also from UAX #29):
> https://developer.apple.com/library/content/documentation/Swift/Conceptual/Swift_Programming_Language/StringsAndCharacters.html
> Such a notion is indeed needed, but it has been always there.
> [1] http://unicode.org/reports/tr29/

Perhaps I don't understand properly the rather obscure definitions, like

        An extended grapheme cluster is the same as a legacy grapheme
        cluster, with the addition of some other characters.


1. Graphemes, if I understand correctly, are language dependent, textels
are not.

2. Textel "ń" means both U+0144 and <U+006E,U+0301>, so it is a notion
on a higher abstraction level then a grapheme cluster.

Moreover I don't want to call <U+006E,U+0301> (LATIN SMALL LETTER N,
COMBINING ACUTE ACCENT) an extended grapheme cluster for at least 2

1. there is nothing extended in it
2. U+0301 is not a grapheme according to Polish linguistics terminology



Prof. dr hab. Janusz S. Bien -  Uniwersytet Warszawski (Katedra Lingwistyki Formalnej)
Prof. Janusz S. Bien - University of Warsaw (Formal Linguistics Department)
jsbien at uw.edu.pl, jsbien at mimuw.edu.pl, http://fleksem.klf.uw.edu.pl/~jsbien/

More information about the Unicode mailing list