CLDR/ICU proposal: collation rules for import only

Markus Scherer markus.icu at gmail.com
Fri Apr 18 16:41:28 CDT 2014


Dear CLDR & ICU teams & users,

Summary: I propose that we distinguish for-import-only rules from
create-a-sort-order rules via a naming convention rather than flags in the
data.

Details:

In collation rules, we can "import" the rules of another tailoring. For
example, common/collation/bs.xml<http://unicode.org/cldr/trac/browser/trunk/common/collation/bs.xml>has
<import
source="hr"/>.

We want to extend this by writing partial rules that are not intended as
their own sort orders but only for import into other rules. See
http://cldr.unicode.org/development/development-process/design-proposals/collation-additions#TOC-Collation-Importand
http://unicode.org/cldr/trac/ticket/3949

The idea was to use <settings private="true"> in CLDR, and I see that that
attribute exists in
common/dtd/ldml.dtd<http://unicode.org/cldr/trac/browser/trunk/common/dtd/ldml.dtd>but
it is marked as deprecated, and it is not documented in the LDML
collation spec. In ICU we would turn it into something like NoBinary{""} (
http://bugs.icu-project.org/trac/ticket/8082).

However, we also want to suppress such for-import-only rules from the lists
of "available" keyword values and collators (
http://bugs.icu-project.org/trac/ticket/8983). If we did this via a data
flag, then we would have to load the data before we can find out that we
want to exclude it from the list.

In addition, collation types are normally added to the
common/bcp47/collation.xml file. This is undesirable for what are really
internal identifiers. We don't want to advertise them as available, *we
don't want to collect display names for them*, and we don't want to have to
keep them stable.

I have a simpler proposal:

- I propose that we use a naming convention to distinguish for-import-only
rules.
- I propose that the first character of the collation type be digit '0' if
an only if the rules are only to be used for import, not for establishing
complete sort orders nor creating collators.
- We would not need an XML attribute, nor an ICU resource bundle entry, nor
would we add such types into bcp47/collation.xml.

For example, we might create a type="0kana" tailoring that would be
imported into the Japanese standard and unihan tailorings; and we might
create a type="0pinyin" tailoring that would be imported into the Chinese
pinyin and unihan tailorings.

Please let me know if you disagree.

Sincerely,
markus
-- 
Google Internationalization Engineering
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/cldr-users/attachments/20140418/e2d02599/attachment.html>


More information about the CLDR-Users mailing list