Proposed Expansion of Grapheme Clusters to Whole Aksharas - Implementation Issues
Richard Wordingham via Unicode
unicode at unicode.org
Mon Dec 11 04:16:31 CST 2017
On Sun, 10 Dec 2017 21:14:18 -0800
Manish Goregaokar via Unicode <unicode at unicode.org> wrote:
> > GB9c: (Virama | ZWJ ) × Extend* LinkingConsonant
>
> You can also explicitly request ligatureification with a ZWJ, so
> perhaps this rule should be something like
>
> (Virama ZWJ? | ZWJ) x Extend* LinkingConsonant
>
> -Manish
>
> On Sat, Dec 9, 2017 at 7:16 AM, Mark Davis ☕️ via Unicode <
> unicode at unicode.org> wrote:
>
> > 1. You make a good point about the GB9c. It should probably instead
> > be something like:
> >
> > GB9c: (Virama | ZWJ ) × Extend* LinkingConsonant
This change is unnecessary. If we start from Draft 1 where there are:
GB9: × (Extend | ZWJ | Virama)
GB9c: (Virama | ZWJ ) × LinkingConsonant
If the classes used in the rules are to be disjoint, we then have to
split Extend into something like ViramaExtend and OtherExtend to allow
normalised (NFC/NFD) text, at which point we may as well continue to
have rules that work without any normalisation. Informally,
ViramaExtend = Extend and ccc ≠ 0.
OtherExtend = Extend and ccc = 0.
(We might need to put additional characters in ViramaExtend.)
This gives us rules:
GB9': × (OtherExtend | ViramaExtend | ZWJ | Virama)
GB9c': (Virama | ZWJ ) ViramaExtend* × LinkingConsonant
So, for a sequence <virama, ZWJ, nukta, LinkingConsonant>, GB9' gives us
virama × ZWJ × nukta LinkingConsonant
and GB9c' gives us
virama × ZWJ × nukta × LinkingConsonant
---
In Rule GB9c, what examples justify including ZWJ? Are they just the C1
half-forms? My knowledge suggests that
GB9c'': Virama (ZWJ | ViramaExtend)* × LinkingConsonant
might be more appropriate.
Richard.
More information about the Unicode
mailing list