Asmus Freytag (c) asmusf at ix.netcom.com
Sun Sep 18 15:02:01 CDT 2016

On 9/18/2016 3:26 AM, Janusz S. Bien wrote:
> Quote/Cytat - Christoph Päper <christoph.paeper at crissov.de> (pią, 16 
> wrz 2016, 23:51:38):
>> Janusz S. Bień <jsbien at mimuw.edu.pl>:
>>> 1. Graphemes, if I understand correctly, are language dependent, …
>> That’s true in linguistic terminology – well, at least within the 
>> more popular schools of thought –, but not in technical (i.e. 
>> Unicode) jargon.
> From the Unicode glossary:
> Grapheme. (1) A minimally distinctive unit of writing in the context 
> of a particular writing system.[...] (2) What a user thinks of as a 
> character.

"writing system" is vague enough to cover variations that might be 
regional or language dependent.
> As for (2), cf.
> User-Perceived Character. What everyone thinks of as a character in 
> their script.
> So we have "a user" versus "everyone...in their script" - is the 
> difference intentional? Probably not. Anyway the definitions are 
> language/locale dependent.

The "everyone" here aims at a shared understanding.

This becomes tricky in the case of Abugidas. There's certainly a shared 
understanding that the "unit of writing" is the syllable, rather than in 
individual mark, but the latter do have well-understood identities, not 
least for teaching. That's perhaps the reason why there's the handwaving 
about "minimally distinctive".

In some scripts like that, users can enter multiple sequences of 
characters that resolve (for all practical purposes) into the same 
syllable. (A big part of that in some scripts is that Unicode does not 
always provide a means to normalize the order of subsidiary signs and 
marks, typically combining marks)

For some tasks it would be great to have only well-formed syllables; but 
to do that, you would need to add additional interpretation on top of 
the Unicode definitions of a grapheme cluster.

If you just wrap the raw combining sequences into textels, then some 
tasks might not actually get simpler. Instead of a simple rule that 
determines which alternate orderings of marks are equivalent (to account 
for users not typing them in the preferred order) you would have to 
exhaustively list all combinations and set up equivalent tables.


More information about the Unicode mailing list