Representing Additional Types of Flags
doug at ewellic.org
Thu Jul 2 12:33:30 CDT 2015
Mark Davis <mark at macchiato dot com> wrote:
>> Is there any precedent for CLDR to define the validity of Unicode
>> character sequences?
> We already have, in tr51, the unicode_region_codes being used for
> validity testing of flags:
the second of which (Annex B) says:
"The valid region sequences are specified by Unicode region subtags as
defined in [CLDR], excluding those that are designated private-use or
deprecated in [CLDR]."
In that case, the wording in TUS needs to be corrected, because TUS 7.0
"The regional indicator symbols in the range U+1F1E6..U+1F1FF can be
used in pairs to represent an ISO 3166 region code."
It doesn't say anything about valid pairs being defined by CLDR instead
of ISO. I wonder how many users actually know this.
> Those are typically the same as the ISO codes, but do add XK
So QO, QU, and ZZ would be excluded, since those are private-use in BCP
47 and hence also in CLDR. But XK is included, even though it is also
private-use. Is this correct? Can an application tell that XK is in and
the others are out, just by looking at CLDR data?
Also, I assume all of the same include/exclude rules apply both to RIS
combinations and to PRI #299-style flag tags. Please let me know if
that's not true.
> CLDR treats UK as deprecated.
> But you're right; we need to be able to distinguish this case (and
> ones like it.) I filed
OK, so UK is not valid in RIS combinations or flag tags either. Glad to
see that clarified.
>> Is there any significance to the "subtype" hierarchy as far as flag
>> tags are concerned, or are "[flag]FRJ" and "[flag]FR75" equally
> No, there isn't. But see also E.5 in
Right, clearly flags don't exist for many of the subdivisions. But I'm
not sure this is the same question as whether the three-level hierarchy
is relevant. In my example, Île-de-France and Paris both have flags,
and they aren't the same. (Wikipedia says the Île-de-France flag is
"non-official and unused," but they do have a page for it, and in any
case there are probably better examples.)
> The only purpose for the 4-character subdivision codes is stability.
> So let's suppose that Colorado decides to join Canada (thereby
> deprecating CO in ISO 3166-2), and British Columbia decides to join
> the US (getting the code CO in ISO 3166-2). In that case, CLDR would
> keep the old code CO (but deprecated) and create a new 4-letter code
> for BC, such as XXCO. This is just for illustration, of course, I've
> heard no rumors about either political shift...
Thanks for the 'XXCO' example; this is different from tending toward
'COXX' and was what I was looking for.
The exact scenario would not apply, of course, due to the agreement to
keep subdivision codes unique across the US/Canada border. I'd suppose
this would be preserved, and 3166-2 would assign US-BC to "British
Columbia as US state," and there would be no coding conflict to resolve.
But again, additional examples could easily be dreamed up: replace BC
with the Central Abaco region of the Bahamas (currently BS-CO), which
isn't that far away.
>> (private-use flag tags)
> We'll have to address that. My view is that they should not be valid:
> if someone wants a PU flag, of any source, they have over 130,000
> Unicode PU characters to play with.
I concur, and this is consistent with Annex B.
Doug Ewell | http://ewellic.org | Thornton, CO
More information about the Unicode