From cldr-users at unicode.org  Fri Sep 20 00:48:28 2019
From: cldr-users at unicode.org (Cameron Dutro via CLDR-Users)
Date: Thu, 19 Sep 2019 22:48:28 -0700
Subject: Katakana-Latin Transformation
Message-ID: <CAECedD-awieWPOOhPGbAmGqbKHQCH2aupMRpJ_-x95b-t6d4Cg@mail.gmail.com>

Hey CLDR users,

I'm currently trying to update TwitterCLDR
<https://github.com/twitter/twitter-cldr-rb> to CLDR v35 (from v29, yikes),
and running into an issue with the Katakana-Latin BGN transliteration
rules. It looks like the current rules don't take combining marks into
account. I'm specifically seeing issues with Dakuten
<https://en.wikipedia.org/wiki/Dakuten_and_handakuten>, special combining
Hiragana and Katakana sound marks. The transliteration rules specify
applying an NFD normalization to the input text first, meaning all the
Dakuten are converted into individual codepoints. However, they are not
considered by the subsequent set of transliteration rules and remain in the
output text, combining with the Roman characters in unexpected ways.

Looking through the CLDR XML source, I came across this line
<https://github.com/unicode-org/cldr/blob/release-35/common/transforms/Katakana-Latin-BGN.xml#L31>
that
specifies a special $wordBoundary variable and a corresponding comment
directing the reader to the old CLDR issue tracker. I was able to
cross-reference this issue
<https://unicode-org.atlassian.net/browse/CLDR-4238> in the new JIRA system
that I think is the successor to the old Trac issue. It describes exactly
the problem I'm dealing with.

Interestingly, the $wordBoundary variable in the transliteration rules XML
file is not actually used anywhere in the transliteration rules. This
suggests at one point it was perhaps designed to handle Daktuen in some way.

It appears ICU 64.2 emits the correct transliteration results while 57.1
does not, without any alterations to the transliteration rules in the CLDR.

This leaves me with the following questions:

   1. Does ICU use the $wordBoundary variable even though it doesn't appear
   in any rules? If so, how?
   2. It appears somehow ICU was fixed without a concurrent update to the
   CLDR rules. What changes were made to ICU?

Thanks in advance,

-Cameron
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/cldr-users/attachments/20190919/6af927f8/attachment.html>

From cldr-users at unicode.org  Fri Sep 27 18:28:52 2019
From: cldr-users at unicode.org (Peter Edberg via CLDR-Users)
Date: Fri, 27 Sep 2019 16:28:52 -0700
Subject: CLDR v36 beta available for testing
Message-ID: <6321D4EA-53F4-4111-BF70-8BEEE56E1E89@unicode.org>

Dear CLDR users,

The beta version of Unicode CLDR 36 <http://cldr.unicode.org/index/downloads/cldr-36> is available for testing. The final release is expected on October 4.

Unicode CLDR 36 provides an update to the key building blocks for software supporting the world's languages. CLDR data is used by all major software systems <http://cldr.unicode.org/index#TOC-Who-uses-CLDR-> for their software internationalization and localization, adapting software to the conventions of different languages for such common software tasks.

CLDR 36 included a full Survey Tool data collection phase <https://www.unicode.org/cldr/charts/36/supplemental/locale_coverage.html>, adding approximately 32 thousand new translated fields, with significant increases to Basic-level coverage for several languages including az (Azerbaijani, Latin script), qu (Quechua), so (Somali), tg (Tajik, Cyrillic script). Seed data was added for several new languages including cic (Chickasaw), mus (Muscogee), osa (Osage, Osage script).

Enhancements in v36 include:
New Emoji 13 draft candidates? names and search keywords are included in this release to support smooth adoption of the upcoming Emoji release (scheduled for release in 2020Q1 as part of Unicode 13)
New measurement units and patterns: dot-per-centimeter, dot-per-inch, em, megapixel, pixel, pixel-per-centimeter, pixel-per-inch; decade; therm-us; bar, pascal; and a pattern for combining units in a multiplicative relationship, such as ?newton-meter?.
Locale IDs:
Extended Language Matching to have fallbacks for many encompassed languages.
Added more languageAliases from the BCP47 language subtag registry, for deprecated languages.

There are some infrastructure changes to be aware of, including:
The cldr repository has moved from subversion to git, and queries using Trac no longer work. See CLDR Change Requests <http://cldr.unicode.org/index/bug-reports> for new information. 
The data in the cldr repository now preserves votes for inherited data, indicated with ?????. In order to generate CLDR in the previous form without ????? and with proper minimization, a new tool GenerateProductionData is available. 
Note: Release data that has been processed with GenerateProductionData is available in a parallel cldr-staging repository, with the same release tags.

Best regards,
- Peter Edberg for CLDR

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/cldr-users/attachments/20190927/bb89fc76/attachment.html>