Janusz S. Bień jsbien at
Wed Sep 21 00:09:41 CDT 2016

On Tue, Sep 20 2016 at 10:57 CEST, christoph.paeper at writes:
> Julian Bradfield <jcb+unicode at>:
>> On 2016-09-19, Christoph Päper <christoph.paeper at> wrote:
>>> If _encyclopedia, encyclopædia, encyclopaedia_ are all legal
>>> spellings of the same word in a writing system, a useful linguistic
>>> definition of grapheme should ensure that all three variants have
>>> the same number of graphemes.
>> Such a bizarre definition, which would also entail "color/colour",
>> "fulfill/fulfil", "sulfur/sulphur" having the same number of
>> graphemes,
> It’s not a bizarre definition at all, but one could also assume two or three different writing systems.
>> would break the first three of your rules of thumb:
> It would, at least partially.
>> and the fourth is pretty dodgy, as it usually contradicts the others
>>> - … whatever can never be split up by hyphenation.
> It’s not phrased well and it does contradict the other rules of thumb
> sometimes indeed, but together they often work reasonably well to
> separate clear cases from questionable ones which are likely to be
> treated differently by different scholars.

Let me remind the issues which started the thread:

On Sun, Sep 18 2016 at 12:26 CEST, jsbien at writes:
> Quote/Cytat - Christoph Päper <christoph.paeper at> (pią, 16
> wrz 2016, 23:51:38):
>> Janusz S. Bień <jsbien at>:
>>> 1. Graphemes, if I understand correctly, are language dependent, …
>> That’s true in linguistic terminology – well, at least within the
>> more popular schools of thought –, but not in technical (i.e.
>> Unicode) jargon.

And what is "grapheme" in "technical (i.e. Unicode) jargon"?

> From the Unicode glossary:
> Grapheme. (1) A minimally distinctive unit of writing in the context
> of a particular writing system.[...] (2) What a user thinks of as a
> character.
> As for (2), cf.
> User-Perceived Character. What everyone thinks of as a character in
> their script.
> So we have "a user" versus " their script" - is the
> difference intentional? Probably not. Anyway the definitions are
> language/locale dependent.

Does 'Grapheme' (2) make sense with "a (single?) user"? 

BTW, it is rather well know that the term "phoneme" was proposed first
by a Polish linguist Jan Niecisław Ignacy Baudouin de Courtenay (13
March 1845 – 3 November 1929), cf. e.g  It is much
less know that he proposed also the term "grapheme". Let me quote
Alexander Berg's "English Historical Linguistics vol. I" page 230 from
Google Books:

       Since the introduction of the term grapheme by Baudouin de
       Courtenay in 1901 (Ruszkiewicz 1976:24-37, 1981 [1978], 20-34),
       it has been defined in various ways:


       As can be seen from these quotatioms, the available definitions
       can be divided into two groups, corresponding to two main senses,
       and reflecting "conflicting linguistics views of the status of
       writing" (Henderson 1985:142):

       1. a letter or cluster of letters referring to or corresponding with a
       single phoneme;

       2. the minimal distinctive unit of a writing system.

For me the first meaning (not mentioned at all in English Wikipedia) is
the primary, i.e. more useful, meaning, as is has some practical
applications e.g. for describing Polish hyphenation rules.

Best regards


Prof. dr hab. Janusz S. Bien -  Uniwersytet Warszawski (Katedra Lingwistyki Formalnej)
Prof. Janusz S. Bien - University of Warsaw (Formal Linguistics Department)
jsbien at, jsbien at,

More information about the Unicode mailing list