graphemes (was: "textels")

Christoph Päper christoph.paeper at crissov.de
Mon Sep 19 14:16:50 CDT 2016


Dalley Mark (South West Commissioning Support) <Mark.Dalley at swcsu.nhs.uk>:
> 
> I think the key phrase is "user-perceived". And you don't need to involve complex scripts either.
> 
> For instance as an English-speaking person, I would perceive the "æ" in "encyclopædia" as being two characters (albeit shoved together somewhat). The argument for this is that the word can equally well be rendered as "encyclopaedia".

If

- encyclopedia
- encyclopædia
- encyclopaedia

are all legal spellings of the same word in a writing system, a useful linguistic definition of grapheme should ensure that all three variants have the same number of graphemes.

Although linguists often prefer minimal pair analysis, there are some rules of thumb for what is a grapheme:

- … whatever goes into a single box in a crossword puzzle.
- … whatever gets transposed if you reverse a word or generate an anagram.
- … whatever gets capitalized together in the beginning of a word.
   (Some argue that capitalization operates on characters, not graphemes, though.)
- … whatever can never be split up by hyphenation.


More information about the Unicode mailing list