Compatibility decomposition for Hebrew and Greek final letters

"Martin J. Dürst" duerst at
Thu Feb 19 21:01:00 CST 2015

On 2015/02/19 20:47, Julian Bradfield wrote:
> On 2015-02-19, Eli Zaretskii <eliz at> wrote:
>> Does anyone know why does the UCD define compatibility decompositions
>> for Arabic initial, medial, and final forms, but doesn't do the same
>> for Hebrew final letters, like U+05DD HEBREW LETTER FINAL MEM?  Or for
>> that matter, for U+03C2 GREEK SMALL LETTER FINAL SIGMA?
> As far as I understand it:
> In Arabic, the variant of a letter is determined entirely by its
> position, so there is no compelling need to represent the forms separately
> (as characters rather than glyphs) save for the existence of legacy
> standards (and if there is, you can use the ZWJ/ZWNJ hacks). Thus the
> forms would not have been encoded but for the legacy standards.
> Whereas in Hebrew, non-final forms appear finally in certain contexts
> in normal text; and in Greek, while Greek text may have a determinate
> choice between σ and ς, there are many contexts where the two symbols
> are distinguished (not least maths).

Digging a bit deeper, the phenomenon of a letter changing shape 
depending on position is pervasive in Arabic, and involves complicated 
interdependencies across multiple characters in good-quality typography. 
But in Hebrew, this phenomenon is minor, and marginal in Greek, and 
typographic interactions are also very limited.

That led to (after some initial tries with alternatives) different 
encoding models. In Arabic, shaping is the job of the rendering engine, 
whereas in Hebrew and Greek, it's part of the encoding.

As for determinate choice between σ and ς, John Cowan once gave an 
example of a Greek word (composed of two original words) with a final 
sigma in the middle.

Regards,   Martin.

More information about the Unicode mailing list