question about identifying CLDR coverage % for Amharic

Mark Davis ☕️ mark at macchiato.com
Fri Feb 24 02:17:01 CST 2017


A few items.

<characterLabel type="modifier">Modifier</characterLabel>
<characterLabel type="musical_symbols">የሙዚቃ ምልክቶች</characterLabel>
<characterLabel type="nature">ተፈጥሮ</characterLabel>

We do flag error cases to vetters (contributors) where we can, and give
warnings. But if they feel that a term is better in a different language or
script, that is up to them to decide.



Mark

On Fri, Feb 24, 2017 at 7:24 AM, Richard Wordingham <
richard.wordingham at ntlworld.com> wrote:

> > > On Thu, Feb 23, 2017 at 12:18 PM, Isabelle Zaugg
> > > <iz6445a at student.american.edu
> > > <mailto:iz6445a at student.american.edu>> wrote:
>
> >>> I am working on my dissertation research and would like to identify
> >>> the percentage of CLDR coverage for Amharic and the other languages
> >>> utilizing the Ethiopic script.  I would like to get a percentage
> >>> coverage for today, as well as look at the increase over time.
>
> What about the decrease over time?  If the 'number of items' in CLDR
> increases, the percentage will drop unless new entries are added for
> Amharic.
>

​Yes, that is what happens with
http://cldr.unicode.org/index/downloads/cldr-30#TOC-Growth. We are moving
the bar up all the time. What we do for that graph is measure the number of
items in the past vs the current set.
​

>
> On Thu, 23 Feb 2017 16:58:01 -0500
> "Tom Bishop, Wenlin Institute" <tangmu at wenlin.com> wrote:
>
> > Is there an established system to derive a meaningful "percentage of
> > CLDR coverage for Amharic" from the data? Just from these (not really
> > random) examples one might estimate 70%.
>

​The way we measure modern coverage is against a set of data described in
http://unicode.org/reports/tr35/tr35-info.html#Coverage_Levels.
​

>
> <examples snipped>
>
> It gets worse than this.  Sometimes default data is appropriate,
> sometimes it isn't.  For example, there is no explicit coverage for
> collation (or at least, there wasn't back in Version 27.0.1).  However,
> if the CLDR default gives the correct results for Amharic, then that
> part of the coverage is complete.
>

​v27 is (relatively) ancient, so I'd suggest you look at more recent
versions, rather than start off with "It gets worse...".

Where we have confirmation that the root collation is sufficient for the
language, then an empty file can be added to
http://unicode.org/repos/cldr/tags/latest/common/collation/. (We used to
use a "validSublocales" attributed, but found that simply having empty
files worked better, procedurally.)

In that directory, you'll see an item for am.xml. It isn't completely
empty: it just rearranges the Ethi script ahead of Latin.


There may even be cases that CLDR refuses to cover.  An example in
> English is that CLDR refuses to handle the difference in indefinite
> article between "a 3-page letter" and "an 8-page letter".  What
> percentage of non-coverage would one calculate for this?
>
​
"Refuses"?

That is a loaded term, usually part of an accusation. Joe could accuse, for
example, you, Richard Wordingham, of "refusing" to run a 4 minute mile,
even though: nobody ever asked you; it wouldn't probably be possible; and
if it were, you probably wouldn't be able to spend the amount of time and
effort to do so; or want to, given your busy life.

The scope of CLDR is to provide a core set of locale data for
internationalization services. It does not have as a goal the ability to
grammatically compose messages in all of the languages it covers. That is a
huge task that many, many people are developing sophisticated ML models for
doing.

We extend the scope of CLDR periodically when we get proposals for doing so
that are feasible, and have a high enough priority given the many, many
items on our "todo" list
<http://unicode.org/cldr/trac/query?status=accepted&status=design&status=new&col=id&col=summary&col=owner&col=type&col=priority&col=milestone&col=component&col=weeks&report=20&desc=1&order=id>
(1821, currently). We were able to do so with plural rules, for example.
And it isn't out of scope in the future for us to support data for doing a
limited set of local-scope adjustments across languages, if we have a
practical proposal for doing so. We haven't "refused" to do a/an.

If you or others are interested in contributing to CLDR, please let us
know. (One caveat; sometimes there are practical limitations on our
accepting contributions because the size of the contribution imposes to
high a cost on just the assessment of it.)
​

>
> Richard.
> _______________________________________________
> CLDR-Users mailing list
> CLDR-Users at unicode.org
> http://unicode.org/mailman/listinfo/cldr-users
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/cldr-users/attachments/20170224/8c06f7dd/attachment-0001.html>


More information about the CLDR-Users mailing list