Combining characters

Mark E. Shoulson mark at kli.org
Sun Dec 14 17:22:06 CST 2025


On 12/14/25 5:44 PM, Asmus Freytag via Unicode wrote:

> On 12/14/2025 10:47 AM, Phil Smith III via Unicode wrote:
>>
>> Well, I’m sorta “asking for a friend” – a coworker who is deep in the 
>> weeds of working with something Unicode-related. I’m blaming him for 
>> having told me that :)
>>
>>
> This actually deserves a deeper answer, or a more "bird's-eye" one, if 
> you want. Read to the end.
>
> The way you asked the question seems to hint that in your minds you 
> and your friend conflate the concept of "combining" mark and 
> "diacritic". That would not be surprising if you are mainly familiar 
> with European scripts and languages, because in that case, this 
> equivalence kind of applies.
>
Yes.  This is crucial.  You (Phil) are writing like "sheez, so there's e 
and there's e-with-an-acute, we might as well just treat them like 
separate letters."  And that maybe makes sense for languages where 
"combining characters" are maybe two or three diacritics that can live 
on five or six letters.  Maybe it does make sense to consider those 
combinations as distinct letters (indeed, some of the languages in 
question do just that.)  But some combining characters are more rightly 
perceived as things separate from the letters which are written in the 
same space (and have historically always been considered so).  The most 
obvious examples would be Hebrew and Arabic vowel-points.  Does it 
really make sense to consider בְ and בֶ and בְּ and all the other 
combinatorics as separate distinct things, when they clearly contain 
separate units, each of which has its own consistent character?  Throw 
in the Hebrew "accents" (cantillation marks) and you're talking an 
enormous combinatorial explosion at the *cost* of simplicity and 
consistency, not improving it.  Ditto Indic vowel-marks and a jillion 
other abjads and abugidas.  If anything, there's a better case to be 
made that the precomposed letters were maybe a wrong move.

(TL;DR: what Asmus said.)

~mark
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://corp.unicode.org/pipermail/unicode/attachments/20251214/9f5bd8d9/attachment.htm>


More information about the Unicode mailing list