Normalizing Syriac
Lorna Evans
lorna_evans at sil.org
Mon Apr 26 16:50:40 CDT 2021
I've got a situation that I'm not sure how to handle...or even if
Unicode or the rendering engines need update.
In a language using Syriac there is a /rish seyame/ which can be
followed by U+0739 or U+0738
/rish /= 072A
/seyame /= 0308
In TUS, chapter 9, it says:
> In Modern Syriac usage, when a word contains a /rish /and a /seyame/,
> the dot of
> the /rish /and the /seyame /are replaced by a /rish /with two dots
> above it.
Then, there's a table which indicates this ligature is obligatory:
> Table 9-17. Syriac Ligatures
>
> Ligature Classes. As in other scripts, ligatures in Syriac vary
> depending on the font style.
> Table 9-17 identifies the principal valid ligatures for each font
> style. When applicable, these
> ligatures are obligatory, unless denoted with an asterisk (*).
>
> rish seyame Right-joining Right-joining Right-joining BFBS (no
> asterisk, so it is obligatory)
>
Finally, in "Developing OpenType Fonts for Syriac Script"
https://docs.microsoft.com/en-us/typography/script-development/syriac
In the "Glossary section" it says:
> *Ligature* - A combination of glyphs that join to form a single glyph.
> For example, the 'rish seyame' (U072a + U0308) combinations of glyphs
> are mandatory ligatures for Syriac. Other ligatures are optional.
>
So, it seems clear that 072a+0308 is a mandatory ligature. The problem
I'm seeing is that when this ligature is followed by U+0739 or U+0738
AND an application does normalization, it changes the sequence to U+072A
U+0739 U+0308 and that breaks the ligature.
You can see why they are reordering it when you see 0308 is 230 and
U+0738 or U+0739 are 220.
0308;COMBINING DIAERESIS;Mn;*230*;NSM;;;;;N;NON-SPACING DIAERESIS;;;;
0738;SYRIAC DOTTED ZLAMA HORIZONTAL;Mn;*220*;NSM;;;;;N;;;;;
0739;SYRIAC DOTTED ZLAMA ANGULAR;Mn;*220*;NSM;;;;;N;;;;;
All of the Syriac fonts that I see only handle this sequence *U+072A
U+0308 U+0739* and not the reordered *U+072A U+0739 U+0308*
Are the fonts wrong, should they be able to handle U+072A U+0739 U+0308?
Or, is there a special normalization rule for this?
How should /rish seyame/ followed by a below mark like U+0738 or U+0739
be handled?
Lorna
**
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://corp.unicode.org/pipermail/unicode/attachments/20210426/fc0d0580/attachment.htm>
More information about the Unicode
mailing list