Add Likely Subtags first step

Philippe Verdy verdy_p at wanadoo.fr
Sat Jan 24 12:47:24 CST 2015


I said "encodable" but **with a standard subtag** (i.e. effectively with
 letters). It would make no sense to say "encode eod" given it is
**already** encoded (but as a grandfather tag; which by itself remains part
of the standard, but not decomposable as a subtag).
The main reason is that there's no real benefit to do it, if the standard
is followed exactly : grandfather **tags** (not subtags) are also part of
the standard. They remain as is; even if there's no replacement. Though it
would be useful to avoid complications in fallback resolutions, because
here as part of a tag only, it is noremally not decomposable (but fallback
resolvers will likely do so to fallback "en-GB-eod" to "en-GB" (and then
"en").
The standard still says nothing about such fallfack mechanisms for
grandfather tags, even if here what is to do seems evident.

Note also that standard variant subtags are also directly linked to
specific parent subtags in which they are valid. It would be enough to
accept "eod" as a variant subtag, but with grandfathered status; valid only
for the "en-GB" combination or maybe also "en-Latn-GB" (which is most
probably what it refers to : the Latin script only, with the "most likely"
script assignment infered by the Oxford Dictionnary which only uses that
script)...

But for now it is still impossible to define the correct replacement tag,
unless en-Latn-GB-eod is also accepted and the IANA database contains not
only suggested "replacements", but also a few needed minimum fallbacks to
standard tags (decomposable as subtags) for grandfathered tags (this is not
an heresy, after all the "likely" properties have also been added to the
IANA registry, just like replacement properties have also been
added  (initially only for deprecated language subtags like "jw" or "iw").

For ambiguous tags that currently have no clear replacements, the posible
candidate fallbacks could also be listed (e.g. for i-mingo) with no
prefered order (applications are free to choose one or the other according
to their own criteria or needs).

This would apply also to a few old language tags (which were initially
encoded as isolated tags; from the ISO 639-1 and -2 language codes, but
were later considered to be language families). The problem being that the
lists of encoded languages (including macrolanguages) which are mapped to a
family is still not defined (unlike the standardized lists of isolated
languages that are mapped to a macrolanguage).

For i-mingo, it is very difficult to see a correct mapping by defining a
"min" language code as a macrolanguage; it would be just a family; but the
list of isolated languages or macrolanguages that are encoded with stand
language subtags is known and their mapping to the "Min" family seems
clear. Fallbacks could still work (after all if we can fallback Mandarin
Chinese to English, we can as well fallback Min languages to another one,
before trying Mandarin (cmn, or just zh, because Mandarin/cmn is the most
likely for Chinese/zh) or Cantonese (yue).

But are there really a lot of data using these grandfathered codes ? Users
of these databases are just instructed that their data is ambiguous and
that they should be more precise (but the same could be said about Quechua
which is hardly a true macrolanguage but more likely a family (it maps to a
likely language only when Querchua is precised with a country subtag such
as Peru, Colombia or Bolivia.(it would be more difficult for the minorities
remaining in Mexico, where their Quechua has been a lot creolized together
or with the Spanish lingua franca)


2015-01-23 20:30 GMT+01:00 Doug Ewell <doug at ewellic.org>:

> Philippe Verdy <verdy underscore p at wanadoo dot fr> wrote:
>
> > The grandfathered "oed" variant for "en-GB" is encodable as a standard
> > variant.
>
> Not unless you squint (or drink) hard enough that "oed" looks like at
> least five letters, the minimum for a well-formed variant that starts
> with a letter.
>
> "oxford" or similar would be syntactically allowable, but "oed" was
> chosen to show clearly that the variant applies to the spelling used in
> the dictionary, not usage in the city of Oxford.
>
> > I wonder why it was not done;
>
> Probably because little would be gained from doing so. The variant would
> make no sense with other languages, and parsers would still have to
> recognize the older form.
>
> --
> Doug Ewell | Thornton, CO, USA | http://ewellic.org
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/cldr-users/attachments/20150124/ef31f294/attachment-0001.html>


More information about the CLDR-Users mailing list