graphemes (was: "textels")

Janusz S. Bień jsbien at
Mon Sep 19 01:40:05 CDT 2016

On Sun, Sep 18 2016 at 21:40 CEST, christoph.paeper at writes:
> Janusz S. Bien <jsbien at>:
>> From the Unicode glossary:
>>> Grapheme. (1) A minimally distinctive unit of writing in the context of a particular writing system.[...] (2) What a user thinks of as a character.
>>> User-Perceived Character. What everyone thinks of as a character in their script.
>> […] the definitions are language/locale dependent.
> A writing system is (usually) language-dependent, a script is not,
> although some scripts have been used exclusively (or prominently) in a
> single writing system with a single language.

It depends of course what do you mean exactly by script, and which
meaning of term is intended in the definition of User-Perceived
Character. But "a user" is definitely language/locale dependent :-)

> So definition (1) of ‘grapheme’ would be appropriate for linguistics,
> (2) maybe for typography and computer science, but it’Í extremely
> vague.

I think that 'grapheme' (2) in the present wording is simply
incorrect. I suspect it is not used in the standard at all.

Searching the Unicode site I found only one use of 'grapheme' alone:

        Graphemes are sequences of one or more encoded characters that
        correspond to what users think of as characters.

I guess the intention of 'grapheme' (2) was to describe it without any
reference to computer encoding, which is definitely an extremely
difficult task.

Best regards


Prof. dr hab. Janusz S. Bien -  Uniwersytet Warszawski (Katedra Lingwistyki Formalnej)
Prof. Janusz S. Bien - University of Warsaw (Formal Linguistics Department)
jsbien at, jsbien at,

More information about the Unicode mailing list