User-perceived character (was: "textels")

Janusz S. Bień jsbien at
Mon Sep 19 01:23:53 CDT 2016

On Sun, Sep 18 2016 at 22:02 CEST, asmusf at writes:
> On 9/18/2016 3:26 AM, Janusz S. Bien wrote:


>> From the Unicode glossary:
>> Grapheme. (1) A minimally distinctive unit of writing in the context
>> of a particular writing system.[...] (2) What a user thinks of as a
>> character.
> "writing system" is vague enough to cover variations that might be
> regional or language dependent.

That is obvious for me.

>> As for (2), cf.
>> User-Perceived Character. What everyone thinks of as a character in
>> their script.
>> So we have "a user" versus " their script" - is the
>> difference intentional? Probably not. Anyway the definitions are
>> language/locale dependent.
> The "everyone" here aims at a shared understanding.

That's also quite obvious for me.

"A user" is grapheme (2) is at least strange.

> This becomes tricky in the case of Abugidas. There's certainly a
> shared understanding that the "unit of writing" is the syllable,
> rather than in individual mark, but the latter do have well-understood
> identities, not least for teaching. That's perhaps the reason why
> there's the handwaving about "minimally distinctive".
> In some scripts like that, users can enter multiple sequences of
> characters that resolve (for all practical purposes) into the same
> syllable. (A big part of that in some scripts is that Unicode does not
> always provide a means to normalize the order of subsidiary signs and
> marks, typically combining marks)
> For some tasks it would be great to have only well-formed syllables;
> but to do that, you would need to add additional interpretation on top
> of the Unicode definitions of a grapheme cluster.
> If you just wrap the raw combining sequences into textels, then some
> tasks might not actually get simpler. Instead of a simple rule that
> determines which alternate orderings of marks are equivalent (to
> account for users not typing them in the preferred order) you would
> have to exhaustively list all combinations and set up equivalent
> tables.

I would like to know how Swift is handling this. I still have a feeling
that the Swift characters are almost exactly my textels.

Best regards


Prof. dr hab. Janusz S. Bien -  Uniwersytet Warszawski (Katedra Lingwistyki Formalnej)
Prof. Janusz S. Bien - University of Warsaw (Formal Linguistics Department)
jsbien at, jsbien at,

More information about the Unicode mailing list