Representing Additional Types of Flags

Doug Ewell doug at ewellic.org
Thu Jul 2 12:33:30 CDT 2015


Mark Davis �� <mark at macchiato dot com> wrote:

>> Is there any precedent for CLDR to define the validity of Unicode
>> character sequences?
>
> We already have, in tr51, the unicode_region_codes being used for
> validity testing of flags:
> http://unicode.org/reports/tr51/#Encoding
> http://unicode.org/reports/tr51/#Flags

the second of which (Annex B) says:

"The valid region sequences are specified by Unicode region subtags as
defined in [CLDR], excluding those that are designated private-use or
deprecated in [CLDR]."

In that case, the wording in TUS needs to be corrected, because TUS 7.0
§22.10 says:

"The regional indicator symbols in the range U+1F1E6..U+1F1FF can be
used in pairs to represent an ISO 3166 region code."

It doesn't say anything about valid pairs being defined by CLDR instead
of ISO. I wonder how many users actually know this.

> Those are typically the same as the ISO codes, but do add XK
> http://unicode.org/reports/tr35/#unicode_region_subtag

So QO, QU, and ZZ would be excluded, since those are private-use in BCP
47 and hence also in CLDR. But XK is included, even though it is also
private-use. Is this correct? Can an application tell that XK is in and
the others are out, just by looking at CLDR data?

Also, I assume all of the same include/exclude rules apply both to RIS
combinations and to PRI #299-style flag tags. Please let me know if
that's not true.

> CLDR treats UK as deprecated.
> [...]
> But you're right; we need to be able to distinguish this case (and
> ones like it.) I filed
> http://unicode.org/cldr/trac/ticket/8736

OK, so UK is not valid in RIS combinations or flag tags either. Glad to
see that clarified.

>> Is there any significance to the "subtype" hierarchy as far as flag
>> tags are concerned, or are "[flag]FRJ" and "[flag]FR75" equally
>> valid?
>
> ​No, there isn't. But see also E.5 in
> http://www.unicode.org/review/pri299/pri299-additional-flags-background.html

Right, clearly flags don't exist for many of the subdivisions. But I'm
not sure this is the same question as whether the three-level hierarchy
is relevant. In my example, Île-de-France and Paris both have flags,
and they aren't the same. (Wikipedia says the Île-de-France flag is
"non-official and unused," but they do have a page for it, and in any
case there are probably better examples.)

> The only purpose for the 4-character subdivision codes is stability.
> So let's suppose that Colorado decides to join Canada (thereby
> deprecating CO in ISO 3166-2), and British Columbia decides to join
> the US (getting the code CO in ISO 3166-2). In that case, CLDR would
> keep the old code CO (but deprecated) and create a new 4-letter code
> for BC, such as XXCO. This is just for illustration, of course, I've
> heard no rumors about either political shift...

Thanks for the 'XXCO' example; this is different from tending toward
'COXX' and was what I was looking for.

The exact scenario would not apply, of course, due to the agreement to
keep subdivision codes unique across the US/Canada border. I'd suppose
this would be preserved, and 3166-2 would assign US-BC to "British
Columbia as US state," and there would be no coding conflict to resolve.
But again, additional examples could easily be dreamed up: replace BC
with the Central Abaco region of the Bahamas (currently BS-CO), which
isn't that far away.

>> (private-use flag tags)
>
> ​We'll have to address that. My view is that they should not be valid:
> if someone wants a PU flag, of any source, they have over 130,000
> Unicode PU character​s to play with.

I concur, and this is consistent with Annex B.

Thanks,

--
Doug Ewell | http://ewellic.org | Thornton, CO ����




More information about the Unicode mailing list