Unicode locale ID vs. POSIX variant

Mark Davis ☕️ mark at macchiato.com
Tue Sep 30 23:50:17 CDT 2014


en_US_POSIX is the "old style", which would be represented as
en-US-u-va-posix in BCP47.

We use pre-bcp47 IDs in the file and identifier tree for main (although
we've switched over for keyboards/). So the term POSIX does occur in
http://www.unicode.org/repos/cldr/tags/release-1-7/common/supplemental/supplementalMetadata.xml,
in
<variable id="$variant" type="choice">. These variables, however, are not
meant to indicate bcp47 compliance, but rather are used for verifying the
'old style' IDs that are used in the tree. They are also limited:
the <variable id="$language" type="choice"> doesn't include all the valid
language subtags.

I agree that this is not nearly as clean as we'd like. You should probably
file a ticket, and mention the text that led you astray, or where you see a
good point in the text where we can clarify this better.


Mark <https://google.com/+MarkDavis>

*— Il meglio è l’inimico del bene —*

On Tue, Sep 30, 2014 at 10:01 PM, Markus Scherer <markus.icu at gmail.com>
wrote:

> Please help me understand the POSIX locale variant.
>
> I assume that en_US_POSIX is still valid in the old syntax. For example,
> see common/collation/en_US_POSIX.xml
> <http://unicode.org/cldr/trac/browser/trunk/common/collation/en_US_POSIX.xml>
>
> I assume that en-US-POSIX is a valid Unicode Language Identifier
> <http://www.unicode.org/reports/tr35/tr35.html#Unicode_language_identifier> (new
> syntax) because CLDR supplemental metadata includes POSIX as a valid
> variant.
>
> CLDR also defines -u-va-posix.
>
> It looks like in non-CLDR BCP 47, en-US-POSIX is not valid because POSIX
> is not a registered language subtag.
>
> Legacy Variants
> <http://www.unicode.org/reports/tr35/tr35.html#Legacy_Variants> says to
> convert the old-syntax variant POSIX to -u-va-posix.
>
> Should a Unicode Language Identifier use variant POSIX but when converting
> to non-CLDR BCP 47 convert that variant to -u-va-posix? Or should one
> always convert from old syntax to new -u-va-posix (just in case the
> recipient only understands BCP 47)?
>
> When and where else should one convert between the POSIX variant and the
> -u-va-posix extension?
>
> Part of the problem is that old and new syntax are indistinguishable when
> there is no '@' and no singleton subtag, and - and _ are both accepted as
> separators, as usual.
>
> Is en-US-POSIX-u-va-posix valid?
> Is it the same as en-US-u-va-posix?
>
> *References:*
>
> supplementalMetadata.xml
> <http://unicode.org/cldr/trac/browser/trunk/common/supplemental/supplementalMetadata.xml> includes
> "POSIX" in <variable id="$variant" type="choice">
>
> I see http://www.unicode.org/reports/tr35/tr35.html#Key_Type_Definitions
>
> Locale variant
>
> *bcp47/variant.xml*
> "va"Common variant type"posix"POSIX style locale variant
> and http://unicode.org/repos/cldr/trunk/common/bcp47/variant.xml
> <key name="va" description="Common locale variant type key">
> <type name="posix" description="POSIX style locale variant"/>
>
>
> https://www.iana.org/assignments/language-subtag-registry/language-subtag-registry
> does not mention POSIX.
>
> markus
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/cldr-users/attachments/20141001/c44f64ad/attachment.html>


More information about the CLDR-Users mailing list