CLDR/ICU proposal: collation rules for import only

John Emmons emmo at us.ibm.com
Mon Apr 21 10:51:34 CDT 2014


I would prefer that we have an attribute for it, so that it is crystal
clear to everyone exactly what is going on.  I really don't like the idea
of "0" + ruleset naming convention.

We have a similar situation in the RBNF rules.  There we use:

<ruleset type="and-feminine" access="private">

I would think that the most logical thing would be to extend the use of the
access attribute, such that we have:

<rules access="private">


Regards,

John C. Emmons
Globalization Architect & Unicode CLDR TC Chairman
IBM Software Group
Internet: emmo at us.ibm.com




From:	Markus Scherer <markus.icu at gmail.com>
To:	"cldr-users at unicode.org" <cldr-users at unicode.org>, icu-design
            <icu-design at lists.sourceforge.net>,
Date:	04/18/2014 04:44 PM
Subject:	CLDR/ICU proposal: collation rules for import only
Sent by:	"CLDR-Users" <cldr-users-bounces at unicode.org>



Dear CLDR & ICU teams & users,

Summary: I propose that we distinguish for-import-only rules from
create-a-sort-order rules via a naming convention rather than flags in the
data.

Details:

In collation rules, we can "import" the rules of another tailoring. For
example, common/collation/bs.xml has <import source="hr"/>.

We want to extend this by writing partial rules that are not intended as
their own sort orders but only for import into other rules. See
http://cldr.unicode.org/development/development-process/design-proposals/collation-additions#TOC-Collation-Import
 and http://unicode.org/cldr/trac/ticket/3949

The idea was to use <settings private="true"> in CLDR, and I see that that
attribute exists in common/dtd/ldml.dtd but it is marked as deprecated, and
it is not documented in the LDML collation spec. In ICU we would turn it
into something like NoBinary{""} (
http://bugs.icu-project.org/trac/ticket/8082).

However, we also want to suppress such for-import-only rules from the lists
of "available" keyword values and collators (
http://bugs.icu-project.org/trac/ticket/8983). If we did this via a data
flag, then we would have to load the data before we can find out that we
want to exclude it from the list.

In addition, collation types are normally added to the
common/bcp47/collation.xml file. This is undesirable for what are really
internal identifiers. We don't want to advertise them as available, we
don't want to collect display names for them, and we don't want to have to
keep them stable.

I have a simpler proposal:

- I propose that we use a naming convention to distinguish for-import-only
rules.
- I propose that the first character of the collation type be digit '0' if
an only if the rules are only to be used for import, not for establishing
complete sort orders nor creating collators.
- We would not need an XML attribute, nor an ICU resource bundle entry, nor
would we add such types into bcp47/collation.xml.

For example, we might create a type="0kana" tailoring that would be
imported into the Japanese standard and unihan tailorings; and we might
create a type="0pinyin" tailoring that would be imported into the Chinese
pinyin and unihan tailorings.

Please let me know if you disagree.

Sincerely,
markus
--
Google Internationalization Engineering
_______________________________________________
CLDR-Users mailing list
CLDR-Users at unicode.org
http://unicode.org/mailman/listinfo/cldr-users
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/cldr-users/attachments/20140421/4a000ba2/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: graycol.gif
Type: image/gif
Size: 105 bytes
Desc: not available
URL: <http://unicode.org/pipermail/cldr-users/attachments/20140421/4a000ba2/attachment.gif>


More information about the CLDR-Users mailing list