From cldr-users at unicode.org  Sat Dec  2 05:52:43 2017
From: cldr-users at unicode.org (Kip Cole via CLDR-Users)
Date: Sat, 2 Dec 2017 22:52:43 +1100
Subject: UCA question / Produce Collation Element Arrays
Message-ID: <56E27983-5488-41D7-818F-AA6E8AD35A47@gmail.com>

Markus, probably another dumb question but I?m making progress.  In section 7.2 or TR10 the algorithm for producing a CE array says:

S2.1 Find the longest initial substring S at each point that has a match in the collation element table.

S2.1.1 If there are any non-starters following S, process each non-starter C.

S2.1.2 If C is an unblocked non-starter with respect to S, find if S + C has a match in the collation element table.

Note: This condition is specific to non-starters, and is not precisely the same as the concept of blocking in normalization, since it is dealing with look ahead for a discontiguous match, rather than with normalization forms. Hangul jamos and other starters are only supported with contiguous matches .

S2.1.3 If there is a match, replace S by S + C, and remove C. 


For s2.1.1 I?m trying to confirm what ?process each non-starter C? means.  Best I understand so far it means ?ignore? or ?skip? all C that are non-starters.  is that the correct interpretation? 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/cldr-users/attachments/20171202/307e3e63/attachment-0001.html>

From cldr-users at unicode.org  Sat Dec  2 06:32:55 2017
From: cldr-users at unicode.org (Kip Cole via CLDR-Users)
Date: Sat, 2 Dec 2017 23:32:55 +1100
Subject: UCA question / Produce Collation Element Arrays
In-Reply-To: <56E27983-5488-41D7-818F-AA6E8AD35A47@gmail.com>
References: <56E27983-5488-41D7-818F-AA6E8AD35A47@gmail.com>
Message-ID: <88B29D14-0D41-470A-9BE7-4E80C8191B02@gmail.com>

Markus and co, probably another dumb question but I?m making progress.  In section 7.2 or TR10 the algorithm for producing a CE array says:

> S2.1 Find the longest initial substring S at each point that has a match in the collation element table.
> 
> S2.1.1 If there are any non-starters following S, process each non-starter C.
> 
> S2.1.2 If C is an unblocked non-starter with respect to S, find if S + C has a match in the collation element table.
> 
> Note: This condition is specific to non-starters, and is not precisely the same as the concept of blocking in normalization, since it is dealing with look ahead for a discontiguous match, rather than with normalization forms. Hangul jamos and other starters are only supported with contiguous matches .
> 
> S2.1.3 If there is a match, replace S by S + C, and remove C. 
> 

For s2.1.1 I?m trying to confirm what ?process each non-starter C? means.  Best I understand so far it means ?ignore? or ?skip? all C that are non-starters.  is that the correct interpretation?   It would seem to be consistent with the annotation:

Steps 2.1.1 ?process each non-starter C? and 2.1.2 ?find if S + C has a match in the table?, where one or more intermediate non-starters may be skipped (making it discontiguous), extends a contraction match by one code point at a time to find the next match. In particular, if C is a non-starter and if the table had a mapping for ABC but not one for AB, then a discontiguous-contraction match on text ABMC (with M being a skippable non-starter) would never be found. Well-formedness condition 5 requires the presence of the prefix contraction AB.

From cldr-users at unicode.org  Sat Dec  2 09:25:30 2017
From: cldr-users at unicode.org (=?UTF-8?B?TWFyayBEYXZpcyDimJXvuI8=?= via CLDR-Users)
Date: Sat, 2 Dec 2017 16:25:30 +0100
Subject: UCA question / Produce Collation Element Arrays
In-Reply-To: <88B29D14-0D41-470A-9BE7-4E80C8191B02@gmail.com>
References: <56E27983-5488-41D7-818F-AA6E8AD35A47@gmail.com>
 <88B29D14-0D41-470A-9BE7-4E80C8191B02@gmail.com>
Message-ID: <CAJ2xs_HFtWQLXHuhqsFpU2_9k_awOUNCe8pZ_-G3PQ=4QJnxZA@mail.gmail.com>

Supposed that you have the following, where S are starters and n are
non-starters. | represents the current position.

| S1 S2 S3 n1 n2 n3 n4 S4

S1 S2 isn't in the CET, so you emit and logically change the input. I'll
represent that as:

w(S1) | S2 S3 n1 n2 n3 n4 S4

S2 S3 are in the CET, so set S to them. I'll show S by [...]

w(S1) [ S2 S3 ] | n1 n2 n3 n4 S4

You then successively look through each of the n's.

Suppose S2 S3 n1 isn't in the CET, so you continue.
Suppose S2 S3 n2 is in the CET, but n2 is blocked, so you also continue
Suppose S2 S3 n3 is in the CET, and n3 is not blocked, so you set S to
them.

Logically the input list now looks like the following

w(S1) [ S2 S3 n3 ] n1 n2 | n4 S4

Suppose S2 S3 n3 n4 is in the CET, and n4 is not blocked, so you set S to
them. You now have:

w(S1) [ S2 S3 n3 n4 ] n1 n2 | S4

You have run out of non-starters so you stop and emit weight(S2 S3 n3 n4),
and reset the current position to after them.

w(S1) w(S2 S3 n3 n4)  | n1 n2 S4

So the next item you consider is n1.

There is just one subtlety. Notice that when considering whether n4 is
blocked, you don't consider the items you have already put into S. So n3
and n4 can have the same ccc. Normally people don't actually modify the
input stream, so thinking n4 is blocked is an easy error to make.

Mark <https://twitter.com/mark_e_davis>

On Sat, Dec 2, 2017 at 1:32 PM, Kip Cole via CLDR-Users <
cldr-users at unicode.org> wrote:

> Markus and co, probably another dumb question but I?m making progress.  In
> section 7.2 or TR10 the algorithm for producing a CE array says:
>
> > S2.1 Find the longest initial substring S at each point that has a match
> in the collation element table.
> >
> > S2.1.1 If there are any non-starters following S, process each
> non-starter C.
> >
> > S2.1.2 If C is an unblocked non-starter with respect to S, find if S + C
> has a match in the collation element table.
> >
> > Note: This condition is specific to non-starters, and is not precisely
> the same as the concept of blocking in normalization, since it is dealing
> with look ahead for a discontiguous match, rather than with normalization
> forms. Hangul jamos and other starters are only supported with contiguous
> matches .
> >
> > S2.1.3 If there is a match, replace S by S + C, and remove C.
> >
>
> For s2.1.1 I?m trying to confirm what ?process each non-starter C? means.
> Best I understand so far it means ?ignore? or ?skip? all C that are
> non-starters.  is that the correct interpretation?   It would seem to be
> consistent with the annotation:
>
> Steps 2.1.1 ?process each non-starter C? and 2.1.2 ?find if S + C has a
> match in the table?, where one or more intermediate non-starters may be
> skipped (making it discontiguous), extends a contraction match by one code
> point at a time to find the next match. In particular, if C is a
> non-starter and if the table had a mapping for ABC but not one for AB, then
> a discontiguous-contraction match on text ABMC (with M being a skippable
> non-starter) would never be found. Well-formedness condition 5 requires the
> presence of the prefix contraction AB.
> _______________________________________________
> CLDR-Users mailing list
> CLDR-Users at unicode.org
> http://unicode.org/mailman/listinfo/cldr-users
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/cldr-users/attachments/20171202/9a2db31a/attachment.html>

From cldr-users at unicode.org  Sat Dec  2 13:52:15 2017
From: cldr-users at unicode.org (Richard Wordingham via CLDR-Users)
Date: Sat, 2 Dec 2017 19:52:15 +0000
Subject: UCA question / Produce Collation Element Arrays
In-Reply-To: <CAJ2xs_HFtWQLXHuhqsFpU2_9k_awOUNCe8pZ_-G3PQ=4QJnxZA@mail.gmail.com>
References: <56E27983-5488-41D7-818F-AA6E8AD35A47@gmail.com>
 <88B29D14-0D41-470A-9BE7-4E80C8191B02@gmail.com>
 <CAJ2xs_HFtWQLXHuhqsFpU2_9k_awOUNCe8pZ_-G3PQ=4QJnxZA@mail.gmail.com>
Message-ID: <20171202195215.08ae11a9@JRWUBU2>

On Sat, 2 Dec 2017 16:25:30 +0100
Mark Davis ?? via CLDR-Users <cldr-users at unicode.org> wrote:

> Supposed that you have the following, where S are starters and n are
> non-starters. | represents the current position.
> 
> | S1 S2 S3 n1 n2 n3 n4 S4
> 
> S1 S2 isn't in the CET, so you emit and logically change the input.
> I'll represent that as:
> 
> w(S1) | S2 S3 n1 n2 n3 n4 S4

One subtle nitpick here.  One also has to eliminate <S1 S2 S3>, <S1 S2
S3 n1>, ... and <S1 S2 S3n1 n2 n3 n4 S4> before one can conclude that
the relevant collating element is <S1>.  I do this by recording whether
each collating element and prefix of a collating element is the prefix
of a collating element.  This sort of tagging is not logically
necessary, but is practically very useful.

The simplest example of this issue in the DUCET is <U+0FB2 U+0F71
U+0F80>.  Or is a conformant implementation of the UCA allowed to reject
DUCET even if one can find a way to specify that it be used?  There's
no explicit concession that a CET has to be well-formed.

Richard.


From cldr-users at unicode.org  Sun Dec  3 06:36:57 2017
From: cldr-users at unicode.org (=?UTF-8?B?TWFyayBEYXZpcyDimJXvuI8=?= via CLDR-Users)
Date: Sun, 3 Dec 2017 13:36:57 +0100
Subject: UCA question / Produce Collation Element Arrays
In-Reply-To: <20171202195215.08ae11a9@JRWUBU2>
References: <56E27983-5488-41D7-818F-AA6E8AD35A47@gmail.com>
 <88B29D14-0D41-470A-9BE7-4E80C8191B02@gmail.com>
 <CAJ2xs_HFtWQLXHuhqsFpU2_9k_awOUNCe8pZ_-G3PQ=4QJnxZA@mail.gmail.com>
 <20171202195215.08ae11a9@JRWUBU2>
Message-ID: <CAJ2xs_FCGMq=F5FhVhHB57crWTuAmma=Xp=pbdqXn1FNPs6RxA@mail.gmail.com>

The algorithm is predicated on any input table being well formed. (
http://unicode.org/reports/tr10/#Well-Formed)

Tibetan is a documented exception in the DUCET, but it also documents how
to fix it.

Mark <https://twitter.com/mark_e_davis>

On Sat, Dec 2, 2017 at 8:52 PM, Richard Wordingham via CLDR-Users <
cldr-users at unicode.org> wrote:

> On Sat, 2 Dec 2017 16:25:30 +0100
> Mark Davis ?? via CLDR-Users <cldr-users at unicode.org> wrote:
>
> > Supposed that you have the following, where S are starters and n are
> > non-starters. | represents the current position.
> >
> > | S1 S2 S3 n1 n2 n3 n4 S4
> >
> > S1 S2 isn't in the CET, so you emit and logically change the input.
> > I'll represent that as:
> >
> > w(S1) | S2 S3 n1 n2 n3 n4 S4
>
> One subtle nitpick here.  One also has to eliminate <S1 S2 S3>, <S1 S2
> S3 n1>, ... and <S1 S2 S3n1 n2 n3 n4 S4> before one can conclude that
> the relevant collating element is <S1>.  I do this by recording whether
> each collating element and prefix of a collating element is the prefix
> of a collating element.  This sort of tagging is not logically
> necessary, but is practically very useful.
>
> The simplest example of this issue in the DUCET is <U+0FB2 U+0F71
> U+0F80>.  Or is a conformant implementation of the UCA allowed to reject
> DUCET even if one can find a way to specify that it be used?  There's
> no explicit concession that a CET has to be well-formed.
>
> Richard.
>
> _______________________________________________
> CLDR-Users mailing list
> CLDR-Users at unicode.org
> http://unicode.org/mailman/listinfo/cldr-users
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/cldr-users/attachments/20171203/34201e45/attachment.html>

From cldr-users at unicode.org  Sun Dec  3 13:23:51 2017
From: cldr-users at unicode.org (Richard Wordingham via CLDR-Users)
Date: Sun, 3 Dec 2017 19:23:51 +0000
Subject: UCA question / Produce Collation Element Arrays
In-Reply-To: <CAJ2xs_FCGMq=F5FhVhHB57crWTuAmma=Xp=pbdqXn1FNPs6RxA@mail.gmail.com>
References: <56E27983-5488-41D7-818F-AA6E8AD35A47@gmail.com>
 <88B29D14-0D41-470A-9BE7-4E80C8191B02@gmail.com>
 <CAJ2xs_HFtWQLXHuhqsFpU2_9k_awOUNCe8pZ_-G3PQ=4QJnxZA@mail.gmail.com>
 <20171202195215.08ae11a9@JRWUBU2>
 <CAJ2xs_FCGMq=F5FhVhHB57crWTuAmma=Xp=pbdqXn1FNPs6RxA@mail.gmail.com>
Message-ID: <20171203192351.70e2f2ed@JRWUBU2>

On Sun, 3 Dec 2017 13:36:57 +0100
Mark Davis ?? via CLDR-Users <cldr-users at unicode.org> wrote:

> The algorithm is predicated on any input table being well formed. (
> http://unicode.org/reports/tr10/#Well-Formed)
> 
> Tibetan is a documented exception in the DUCET, but it also documents
> how to fix it.

But adding the fix does not preserve the order of all strings in
the Tibetan script, only the order of linguistically plausible strings.
The example is the order of the non-defective NFD strings

 ???? 0F40 0FB2 0F84 0F71
 ??? 0F40 0FB2 0F84
 ??? 0F40 0FB2 0F71

(I've only added U+0F40 to make the strings non-defective.)

Relevant facts are:

ccc(0F84) = 9
ccc(0F71) = 129
CE(0F71) < CE(0F84)
All relevant collation elements have different, primary weights.

Under DUCET, we get:
Key of OF40 0FB2 OF71      = CE(0F40) CE(OFB2) CE(0F71)
Key of 0F40 0FB2 0F84      = CE(0F40) CE(0FB2) CE(0F84)
Key of 0F40 0FB2 OF84 0F71 = CE(0F40) CE(0FB2) CE(0F84) CE(0F71)

Tailoring DUCET by adding 'all ten' contractions, making a well formed
collation while not perturbing the sorting of Sanskrit, yields a
different order:

Key of OF40 0FB2 OF71      = CE(0F40) CE(OFB2) CE(0F71)
Key of 0F40 0FB2 OF84 0F71 = CE(0F40) CE(0FB2) CE(0F71) CE(0F84)
Key of 0F40 0FB2 0F84      = CE(0F40) CE(0FB2) CE(0F84)

To create a well-formed collation equivalent to DUCET, one has to add
many more contractions - about 650 by my reckoning.

So, are you saying that a UCA-conformant implementation can simply
reject DUCET for not being well-formed?  Alternatively, are you
claiming that there is a known, straightforward algorithm to repair
any case of non-compliance with WF5 without changing the ordering of
strings?

Richard.


From cldr-users at unicode.org  Sun Dec  3 13:49:03 2017
From: cldr-users at unicode.org (=?UTF-8?B?TWFyayBEYXZpcyDimJXvuI8=?= via CLDR-Users)
Date: Sun, 3 Dec 2017 20:49:03 +0100
Subject: UCA question / Produce Collation Element Arrays
In-Reply-To: <20171203192351.70e2f2ed@JRWUBU2>
References: <56E27983-5488-41D7-818F-AA6E8AD35A47@gmail.com>
 <88B29D14-0D41-470A-9BE7-4E80C8191B02@gmail.com>
 <CAJ2xs_HFtWQLXHuhqsFpU2_9k_awOUNCe8pZ_-G3PQ=4QJnxZA@mail.gmail.com>
 <20171202195215.08ae11a9@JRWUBU2>
 <CAJ2xs_FCGMq=F5FhVhHB57crWTuAmma=Xp=pbdqXn1FNPs6RxA@mail.gmail.com>
 <20171203192351.70e2f2ed@JRWUBU2>
Message-ID: <CAJ2xs_H3NeVdc+2RiLbL4hviX5Pe4ki-n8ho=eh374edFWzzPA@mail.gmail.com>

Mark <https://twitter.com/mark_e_davis>

On Sun, Dec 3, 2017 at 8:23 PM, Richard Wordingham via CLDR-Users <
cldr-users at unicode.org> wrote:

> On Sun, 3 Dec 2017 13:36:57 +0100
> Mark Davis ?? via CLDR-Users <cldr-users at unicode.org> wrote:
>
> > The algorithm is predicated on any input table being well formed. (
> > http://unicode.org/reports/tr10/#Well-Formed)
> >
> > Tibetan is a documented exception in the DUCET, but it also documents
> > how to fix it.
>
> But adding the fix does not preserve the order of all strings in
> the Tibetan script, only the order of linguistically plausible strings.
> The example is the order of the non-defective NFD strings
>
>  ???? 0F40 0FB2 0F84 0F71
>  ??? 0F40 0FB2 0F84
>  ??? 0F40 0FB2 0F71
>
> (I've only added U+0F40 to make the strings non-defective.)
>
> Relevant facts are:
>
> ccc(0F84) = 9
> ccc(0F71) = 129
> CE(0F71) < CE(0F84)
> All relevant collation elements have different, primary weights.
>
> Under DUCET, we get:
> Key of OF40 0FB2 OF71      = CE(0F40) CE(OFB2) CE(0F71)
> Key of 0F40 0FB2 0F84      = CE(0F40) CE(0FB2) CE(0F84)
> Key of 0F40 0FB2 OF84 0F71 = CE(0F40) CE(0FB2) CE(0F84) CE(0F71)
>
> Tailoring DUCET by adding 'all ten' contractions, making a well formed
> collation while not perturbing the sorting of Sanskrit, yields a
> different order:
>
> Key of OF40 0FB2 OF71      = CE(0F40) CE(OFB2) CE(0F71)
> Key of 0F40 0FB2 OF84 0F71 = CE(0F40) CE(0FB2) CE(0F71) CE(0F84)
> Key of 0F40 0FB2 0F84      = CE(0F40) CE(0FB2) CE(0F84)
>
> To create a well-formed collation equivalent to DUCET, one has to add
> many more contractions - about 650 by my reckoning.?


> So, are you saying that a UCA-conformant implementation can simply
> reject DUCET for not being well-formed?


?Well, yes, if they don't use ?
http://unicode.org/reports/tr10/#Well_Formed_DUCET to fix it in one way or
another. CLDR does do adjustments, for example.

Alternatively, are you
> claiming that there is a known, straightforward algorithm to repair
> any case of non-compliance with WF5 without changing the ordering of
> strings?
>

The algorithm is not defined for non-well-formed strings, so it is odd to
talk about "without changing the ordering of strings". I think your main
point (above) is that you think that a batch of other changes are necessary
for it to work for Tibetan. That may be the case; I am not that familiar
with Tibetan requirements.
?

>
> Richard.
>
> _______________________________________________
> CLDR-Users mailing list
> CLDR-Users at unicode.org
> http://unicode.org/mailman/listinfo/cldr-users
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/cldr-users/attachments/20171203/3adf70bb/attachment.html>

From cldr-users at unicode.org  Sun Dec  3 16:48:57 2017
From: cldr-users at unicode.org (Richard Wordingham via CLDR-Users)
Date: Sun, 3 Dec 2017 22:48:57 +0000
Subject: UCA question / Produce Collation Element Arrays
In-Reply-To: <CAJ2xs_H3NeVdc+2RiLbL4hviX5Pe4ki-n8ho=eh374edFWzzPA@mail.gmail.com>
References: <56E27983-5488-41D7-818F-AA6E8AD35A47@gmail.com>
 <88B29D14-0D41-470A-9BE7-4E80C8191B02@gmail.com>
 <CAJ2xs_HFtWQLXHuhqsFpU2_9k_awOUNCe8pZ_-G3PQ=4QJnxZA@mail.gmail.com>
 <20171202195215.08ae11a9@JRWUBU2>
 <CAJ2xs_FCGMq=F5FhVhHB57crWTuAmma=Xp=pbdqXn1FNPs6RxA@mail.gmail.com>
 <20171203192351.70e2f2ed@JRWUBU2>
 <CAJ2xs_H3NeVdc+2RiLbL4hviX5Pe4ki-n8ho=eh374edFWzzPA@mail.gmail.com>
Message-ID: <20171203224857.6a805539@JRWUBU2>

On Sun, 3 Dec 2017 20:49:03 +0100
Mark Davis ?? via CLDR-Users <cldr-users at unicode.org> wrote:

> Mark <https://twitter.com/mark_e_davis>
> On Sun, Dec 3, 2017 at 8:23 PM, Richard Wordingham via CLDR-Users <
> cldr-users at unicode.org> wrote:
> > On Sun, 3 Dec 2017 13:36:57 +0100
> > Mark Davis ?? via CLDR-Users <cldr-users at unicode.org> wrote:
> > So, are you saying that a UCA-conformant implementation can simply
> > reject DUCET for not being well-formed?  

> ?Well, yes, if they don't use ?
> http://unicode.org/reports/tr10/#Well_Formed_DUCET to fix it in one
> way or another. CLDR does do adjustments, for example.

Interesting.  So an implementation can reject the conformance test as
invalid.  It would seem that an implementation that simply prints "DUCET
is not well-formed!" passes the conformance test provided.

What do you mean by 'CLDR does...'?  I have seen ICU wrongly reject
apparently redundant collating elements of a collation - but perhaps I
was doing something wrong.  Do you just mean that the CLDR root
collation includes the ten additions?

> > Alternatively, are you
> > claiming that there is a known, straightforward algorithm to repair
> > any case of non-compliance with WF5 without changing the ordering of
> > strings?

> The algorithm is not defined for non-well-formed strings, so it is
> odd to talk about "without changing the ordering of strings".

I think you've misunderstood my assertion.  By the "ordering of
strings" I mean the order in which they are sorted, not the ordering of
the bytes  within the strings.  I was not talking about strings that
are not well-formed.

> I think
> your main point (above) is that you think that a batch of other
> changes are necessary for it to work for Tibetan. That may be the
> case; I am not that familiar with Tibetan requirements.

No, my new point was that to make DUCET comply with WF5 without
altering the ordering, it requires about 650 additional contractions.
However, only the 10 (really 6) contractions are needed for natural
language strings.  The 650, for example, include four contractions for
each virama, though in natural language there is only one virama that
occurs with Tibetan consonants.

The UCA conformance test includes many strings that do not occur in
natural language, as in the example given in
https://www.unicode.org/Public/UCA/10.0.0/CollationTest.html , namely
0FB2 0F80 0F71 0334, which does not sort equal to 0F77 0334 under DUCET,
but does when just the ten contractions are added.  This pair no longer
appear in the conformance test.

Richard.


From cldr-users at unicode.org  Mon Dec  4 05:59:21 2017
From: cldr-users at unicode.org (Richard Wordingham via CLDR-Users)
Date: Mon, 4 Dec 2017 11:59:21 +0000
Subject: UCA question / Produce Collation Element Arrays
In-Reply-To: <20171203192351.70e2f2ed@JRWUBU2>
References: <56E27983-5488-41D7-818F-AA6E8AD35A47@gmail.com>
 <88B29D14-0D41-470A-9BE7-4E80C8191B02@gmail.com>
 <CAJ2xs_HFtWQLXHuhqsFpU2_9k_awOUNCe8pZ_-G3PQ=4QJnxZA@mail.gmail.com>
 <20171202195215.08ae11a9@JRWUBU2>
 <CAJ2xs_FCGMq=F5FhVhHB57crWTuAmma=Xp=pbdqXn1FNPs6RxA@mail.gmail.com>
 <20171203192351.70e2f2ed@JRWUBU2>
Message-ID: <20171204115921.2455d761@JRWUBU2>

On Sun, 3 Dec 2017 19:23:51 +0000
Richard Wordingham via CLDR-Users <cldr-users at unicode.org> wrote:

> But adding the fix does not preserve the order of all strings in
> the Tibetan script, only the order of linguistically plausible
> strings.
> To create a well-formed collation equivalent to DUCET, one has to add
> many more contractions - about 650 by my reckoning.

I've checked my calculations, and it's actually about 970 NFD entries.
They are:

CE(0FB2 x)           = CE(0FB2) CE(x)
CE(0FB2 x 0F80)      = CE(0FB2 0F80) CE(x)
CE(0FB2 x 0F71 0F80) = CE(0FB2 0F71 0F80) CE(x)

CE(0FB3 x)           = CE(0FB3) CE(x)
CE(0FB3 x 0F80)      = CE(0FB3 0F80) CE(x)
CE(0FB3 x 0F71 0F80) = CE(0FB3 0F71 0F80) CE(x)

wherever ccc(x) < ccc(0F71), i.e. ccc(x) < 129.

The first set undoes the changes wrought by adding the contraction
CE(0FB2 0F71) for the sake of WF5.  The second and third sets undo the
changes wrought by the first set.

Richard.

From cldr-users at unicode.org  Thu Dec 21 17:25:56 2017
From: cldr-users at unicode.org (Loic Dachary via CLDR-Users)
Date: Fri, 22 Dec 2017 00:25:56 +0100
Subject: Kurdish Kurmanji progress
Message-ID: <fc2a0137-a053-08b4-88c1-784952a571dc@dachary.org>

Hi,

I'm interested in following the progress of the work done on Kurdish Kurmanji[1] to be notified when it transitions from "seed" to "common". How can I do that ?

Thanks in advance for any pointers you can provide :-)

[1] http://www.unicode.org/cldr/charts/32/supplemental/locale_coverage.html

-- 
Lo?c Dachary, Artisan Logiciel Libre

From cldr-users at unicode.org  Thu Dec 28 19:54:03 2017
From: cldr-users at unicode.org (Shervin Afshar via CLDR-Users)
Date: Thu, 28 Dec 2017 17:54:03 -0800
Subject: Kurdish Kurmanji progress
In-Reply-To: <fc2a0137-a053-08b4-88c1-784952a571dc@dachary.org>
References: <fc2a0137-a053-08b4-88c1-784952a571dc@dachary.org>
Message-ID: <CA+ONODkJ0Ht=60Odp2-f_c+XHj+hFQejyVSWoJGjeNq0ECQ+aQ@mail.gmail.com>

You could monitor the data files which at the moment live in seed directory
<https://unicode.org/cldr/trac/browser/trunk/seed#main> in the codebase.
When the locale data file is mature enough, they would be moved to common
directory <https://unicode.org/cldr/trac/browser/trunk/common/main>. Also
see this comment <https://unicode.org/cldr/trac/ticket/9964#comment:2>
regarding "seed" vs. "common" and why a locale being under either of these
shouldn't make a difference for contributors.


? Shervin

On Thu, Dec 21, 2017 at 3:25 PM, Loic Dachary via CLDR-Users <
cldr-users at unicode.org> wrote:

> Hi,
>
> I'm interested in following the progress of the work done on Kurdish
> Kurmanji[1] to be notified when it transitions from "seed" to "common". How
> can I do that ?
>
> Thanks in advance for any pointers you can provide :-)
>
> [1] http://www.unicode.org/cldr/charts/32/supplemental/locale_
> coverage.html
>
> --
> Lo?c Dachary, Artisan Logiciel Libre
> _______________________________________________
> CLDR-Users mailing list
> CLDR-Users at unicode.org
> http://unicode.org/mailman/listinfo/cldr-users
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/cldr-users/attachments/20171228/d765401e/attachment.html>

From cldr-users at unicode.org  Fri Dec 29 02:59:46 2017
From: cldr-users at unicode.org (Loic Dachary via CLDR-Users)
Date: Fri, 29 Dec 2017 09:59:46 +0100
Subject: Kurdish Kurmanji progress
In-Reply-To: <CA+ONODkJ0Ht=60Odp2-f_c+XHj+hFQejyVSWoJGjeNq0ECQ+aQ@mail.gmail.com>
References: <fc2a0137-a053-08b4-88c1-784952a571dc@dachary.org>
 <CA+ONODkJ0Ht=60Odp2-f_c+XHj+hFQejyVSWoJGjeNq0ECQ+aQ@mail.gmail.com>
Message-ID: <84b915e8-fd7c-941f-bb09-826c1a86b4bc@dachary.org>

Hi,

On 12/29/2017 02:54 AM, Shervin Afshar wrote:
> You could monitor the data files which at the moment live in seed directory <https://unicode.org/cldr/trac/browser/trunk/seed#main>?in the codebase. When the locale data file is mature enough, they would be moved to common directory <https://unicode.org/cldr/trac/browser/trunk/common/main>. Also see this comment <https://unicode.org/cldr/trac/ticket/9964#comment:2> regarding "seed" vs. "common" and why a locale being under either of these shouldn't make a difference for contributors.

Thanks a lot for the pointer :-)

I'm not fluent in Kurdish Kurmanji and therefore unable to participate, unfortunately. Should I find someone motivated to help, is http://cldr.unicode.org/development/new-cldr-developers the best place to suggest to get them started ? Or is there another guide that I may have missed ?

Cheers

> 
> 
> ? Shervin
> 
> On Thu, Dec 21, 2017 at 3:25 PM, Loic Dachary via CLDR-Users <cldr-users at unicode.org <mailto:cldr-users at unicode.org>> wrote:
> 
>     Hi,
> 
>     I'm interested in following the progress of the work done on Kurdish Kurmanji[1] to be notified when it transitions from "seed" to "common". How can I do that ?
> 
>     Thanks in advance for any pointers you can provide :-)
> 
>     [1] http://www.unicode.org/cldr/charts/32/supplemental/locale_coverage.html <http://www.unicode.org/cldr/charts/32/supplemental/locale_coverage.html>
> 
>     --
>     Lo?c Dachary, Artisan Logiciel Libre
>     _______________________________________________
>     CLDR-Users mailing list
>     CLDR-Users at unicode.org <mailto:CLDR-Users at unicode.org>
>     http://unicode.org/mailman/listinfo/cldr-users <http://unicode.org/mailman/listinfo/cldr-users>
> 
> 

-- 
Lo?c Dachary, Artisan Logiciel Libre

From cldr-users at unicode.org  Fri Dec 29 12:00:44 2017
From: cldr-users at unicode.org (Shervin Afshar via CLDR-Users)
Date: Fri, 29 Dec 2017 10:00:44 -0800
Subject: Kurdish Kurmanji progress
In-Reply-To: <84b915e8-fd7c-941f-bb09-826c1a86b4bc@dachary.org>
References: <fc2a0137-a053-08b4-88c1-784952a571dc@dachary.org>
 <CA+ONODkJ0Ht=60Odp2-f_c+XHj+hFQejyVSWoJGjeNq0ECQ+aQ@mail.gmail.com>
 <84b915e8-fd7c-941f-bb09-826c1a86b4bc@dachary.org>
Message-ID: <CA+ONOD=FrRH7XzscZi05sCU=o3axPK-j8-GHkgYT=feVmVst+Q@mail.gmail.com>

That page you pointed to is for developers. Data is collected for most of
the entries through Survey Tool. You can find more information here:
http://cldr.unicode.org/index/survey-tool/accounts

? Shervin

On Fri, Dec 29, 2017 at 12:59 AM, Loic Dachary <loic at dachary.org> wrote:

> Hi,
>
> On 12/29/2017 02:54 AM, Shervin Afshar wrote:
> > You could monitor the data files which at the moment live in seed
> directory <https://unicode.org/cldr/trac/browser/trunk/seed#main> in the
> codebase. When the locale data file is mature enough, they would be moved
> to common directory <https://unicode.org/cldr/
> trac/browser/trunk/common/main>. Also see this comment <
> https://unicode.org/cldr/trac/ticket/9964#comment:2> regarding "seed" vs.
> "common" and why a locale being under either of these shouldn't make a
> difference for contributors.
>
> Thanks a lot for the pointer :-)
>
> I'm not fluent in Kurdish Kurmanji and therefore unable to participate,
> unfortunately. Should I find someone motivated to help, is
> http://cldr.unicode.org/development/new-cldr-developers the best place to
> suggest to get them started ? Or is there another guide that I may have
> missed ?
>
> Cheers
>
> >
> >
> > ? Shervin
> >
> > On Thu, Dec 21, 2017 at 3:25 PM, Loic Dachary via CLDR-Users <
> cldr-users at unicode.org <mailto:cldr-users at unicode.org>> wrote:
> >
> >     Hi,
> >
> >     I'm interested in following the progress of the work done on Kurdish
> Kurmanji[1] to be notified when it transitions from "seed" to "common". How
> can I do that ?
> >
> >     Thanks in advance for any pointers you can provide :-)
> >
> >     [1] http://www.unicode.org/cldr/charts/32/supplemental/locale_
> coverage.html <http://www.unicode.org/cldr/charts/32/supplemental/locale_
> coverage.html>
> >
> >     --
> >     Lo?c Dachary, Artisan Logiciel Libre
> >     _______________________________________________
> >     CLDR-Users mailing list
> >     CLDR-Users at unicode.org <mailto:CLDR-Users at unicode.org>
> >     http://unicode.org/mailman/listinfo/cldr-users <
> http://unicode.org/mailman/listinfo/cldr-users>
> >
> >
>
> --
> Lo?c Dachary, Artisan Logiciel Libre
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/cldr-users/attachments/20171229/e81733ca/attachment.html>

From cldr-users at unicode.org  Fri Dec 29 14:20:07 2017
From: cldr-users at unicode.org (Loic Dachary via CLDR-Users)
Date: Fri, 29 Dec 2017 21:20:07 +0100
Subject: Kurdish Kurmanji progress
In-Reply-To: <CA+ONOD=FrRH7XzscZi05sCU=o3axPK-j8-GHkgYT=feVmVst+Q@mail.gmail.com>
References: <fc2a0137-a053-08b4-88c1-784952a571dc@dachary.org>
 <CA+ONODkJ0Ht=60Odp2-f_c+XHj+hFQejyVSWoJGjeNq0ECQ+aQ@mail.gmail.com>
 <84b915e8-fd7c-941f-bb09-826c1a86b4bc@dachary.org>
 <CA+ONOD=FrRH7XzscZi05sCU=o3axPK-j8-GHkgYT=feVmVst+Q@mail.gmail.com>
Message-ID: <6fdc9a90-a45d-4996-4fbd-5236c4649c94@dachary.org>

Thanks for clearing the confusion, this is most helpful :-)

On 12/29/2017 07:00 PM, Shervin Afshar wrote:
> That page you pointed to is for developers. Data is collected for most of the entries through Survey Tool. You can find more information here:?http://cldr.unicode.org/index/survey-tool/accounts
> 
> ? Shervin
> 
> On Fri, Dec 29, 2017 at 12:59 AM, Loic Dachary <loic at dachary.org <mailto:loic at dachary.org>> wrote:
> 
>     Hi,
> 
>     On 12/29/2017 02:54 AM, Shervin Afshar wrote:
>     > You could monitor the data files which at the moment live in seed directory <https://unicode.org/cldr/trac/browser/trunk/seed#main <https://unicode.org/cldr/trac/browser/trunk/seed#main>>?in the codebase. When the locale data file is mature enough, they would be moved to common directory <https://unicode.org/cldr/trac/browser/trunk/common/main <https://unicode.org/cldr/trac/browser/trunk/common/main>>. Also see this comment <https://unicode.org/cldr/trac/ticket/9964#comment:2 <https://unicode.org/cldr/trac/ticket/9964#comment:2>> regarding "seed" vs. "common" and why a locale being under either of these shouldn't make a difference for contributors.
> 
>     Thanks a lot for the pointer :-)
> 
>     I'm not fluent in Kurdish Kurmanji and therefore unable to participate, unfortunately. Should I find someone motivated to help, is http://cldr.unicode.org/development/new-cldr-developers <http://cldr.unicode.org/development/new-cldr-developers> the best place to suggest to get them started ? Or is there another guide that I may have missed ?
> 
>     Cheers
> 
>     >
>     >
>     > ? Shervin
>     >
>     > On Thu, Dec 21, 2017 at 3:25 PM, Loic Dachary via CLDR-Users <cldr-users at unicode.org <mailto:cldr-users at unicode.org> <mailto:cldr-users at unicode.org <mailto:cldr-users at unicode.org>>> wrote:
>     >
>     >? ? ?Hi,
>     >
>     >? ? ?I'm interested in following the progress of the work done on Kurdish Kurmanji[1] to be notified when it transitions from "seed" to "common". How can I do that ?
>     >
>     >? ? ?Thanks in advance for any pointers you can provide :-)
>     >
>     >? ? ?[1] http://www.unicode.org/cldr/charts/32/supplemental/locale_coverage.html <http://www.unicode.org/cldr/charts/32/supplemental/locale_coverage.html> <http://www.unicode.org/cldr/charts/32/supplemental/locale_coverage.html <http://www.unicode.org/cldr/charts/32/supplemental/locale_coverage.html>>
>     >
>     >? ? ?--
>     >? ? ?Lo?c Dachary, Artisan Logiciel Libre
>     >? ? ?_______________________________________________
>     >? ? ?CLDR-Users mailing list
>     >? ? ?CLDR-Users at unicode.org <mailto:CLDR-Users at unicode.org> <mailto:CLDR-Users at unicode.org <mailto:CLDR-Users at unicode.org>>
>     >? ? ?http://unicode.org/mailman/listinfo/cldr-users <http://unicode.org/mailman/listinfo/cldr-users> <http://unicode.org/mailman/listinfo/cldr-users <http://unicode.org/mailman/listinfo/cldr-users>>
>     >
>     >
> 
>     --
>     Lo?c Dachary, Artisan Logiciel Libre
> 
> 

-- 
Lo?c Dachary, Artisan Logiciel Libre