Normalizing Syriac

Lorna Evans lorna_evans at sil.org
Mon Apr 26 16:50:40 CDT 2021


I've got a situation that I'm not sure how to handle...or even if 
Unicode or the rendering engines need update.

In a language using Syriac there is a /rish seyame/ which can be 
followed by U+0739 or U+0738

/rish /= 072A

/seyame /= 0308

In TUS, chapter 9, it says:

> In Modern Syriac usage, when a word contains a /rish /and a /seyame/, 
> the dot of
> the /rish /and the /seyame /are replaced by a /rish /with two dots 
> above it.
Then, there's a table which indicates this ligature is obligatory:

> Table 9-17. Syriac Ligatures
>
> Ligature Classes. As in other scripts, ligatures in Syriac vary 
> depending on the font style.
> Table 9-17 identifies the principal valid ligatures for each font 
> style. When applicable, these
> ligatures are obligatory, unless denoted with an asterisk (*).
>
> rish seyame Right-joining Right-joining Right-joining BFBS (no 
> asterisk, so it is obligatory)
>

Finally, in "Developing OpenType Fonts for Syriac Script" 
https://docs.microsoft.com/en-us/typography/script-development/syriac

In the "Glossary section" it says:

> *Ligature* - A combination of glyphs that join to form a single glyph. 
> For example, the 'rish seyame' (U072a + U0308) combinations of glyphs 
> are mandatory ligatures for Syriac. Other ligatures are optional.
>
So, it seems clear that 072a+0308 is a mandatory ligature. The problem 
I'm seeing is that when this ligature is followed by U+0739 or U+0738 
AND an application does normalization, it changes the sequence to U+072A 
U+0739 U+0308 and that breaks the ligature.

You can see why they are reordering it when you see 0308 is 230 and 
U+0738 or U+0739 are 220.

0308;COMBINING DIAERESIS;Mn;*230*;NSM;;;;;N;NON-SPACING DIAERESIS;;;;
0738;SYRIAC DOTTED ZLAMA HORIZONTAL;Mn;*220*;NSM;;;;;N;;;;;
0739;SYRIAC DOTTED ZLAMA ANGULAR;Mn;*220*;NSM;;;;;N;;;;;

All of the Syriac fonts that I see only handle this sequence *U+072A 
U+0308 U+0739* and not the reordered *U+072A U+0739 U+0308*

Are the fonts wrong, should they be able to handle U+072A U+0739 U+0308?

Or, is there a special normalization rule for this?

How should /rish seyame/ followed by a below mark like U+0738 or U+0739 
be handled?

Lorna


**


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://corp.unicode.org/pipermail/unicode/attachments/20210426/fc0d0580/attachment.htm>


More information about the Unicode mailing list