Use of Unicode 6.3 bidi format chars in CLDR number formats?

Peter Edberg pedberg at apple.com
Thu Apr 28 17:44:01 CDT 2016


Dear CLDR users,

One of the longstanding challenges in CLDR has been designing number formats (especially currency formats) and short date formats for bidi-language locales (e.g. ar, fa, he) so that the formatted text is displayed correctly in various contexts (where there may be no surrounding text, or initial text with strong right-to-left or strong left-to-right characters). With currency formats, the currency symbols themselves may involve characters that are neutral, or strong right-to-left or strong left-to-right.

This is exactly one of the types of problems that was intended to be addressed by the addition of new bidi direction format characters in Unicode 6.3 (Sept. 2013), such as U+2067 RIGHT‑TO‑LEFT ISOLATE (RLI) and U+2069 POP DIRECTIONAL ISOLATE (PDI). See UAX #14 Unicode Bidirectional Algorithm <http://www.unicode.org/reports/tr9/>.

For CLDR 30, we are considering whether to start using some of these characters in some number formats; typically those formats would begin with a RLI, and end with a PDI. Two important considerations are:
1. Will the systems on which CLDR 30 data is used implement support for these bidi direction format characters?
2. Do the systems that will be used for CLDR 30 Survey Tool data collection implement support for those characters (e.g. for generating correct examples)?

For cases in which the answers to the above questions are “no”, we can address some of the issues as follows:
• For #1, in the tools that generate JSON data and ICU-format data, options can be added to replace any RLI…PDI combination that wraps a number format with an initial RLM (right-left mark) instead. This will result in the format having the same display layout when used in isolation, thought it may not have the same layout when used in the middle of other text.
• For #2, the Survey Tool example generators can also replace RLI…PDI with an initial RLM, and ensure that the resulting format is displayed by itself in a text cell, in order to produce the same format display layout that will be produced by the RLI…PDI on systems that support them.

One remaining concern is the extent to which copy-paste of formats generated by CLDR will correctly include the RLI..PDI characters.

We would appreciate any input from CLDR users on this, thanks!

Peter Edberg, for the CLDR project

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/cldr-users/attachments/20160428/d8a6a6fb/attachment.html>


More information about the CLDR-Users mailing list