Use of Unicode 6.3 bidi format chars in CLDR number formats?
Mark Davis ☕️
mark at macchiato.com
Fri Apr 29 02:24:51 CDT 2016
The number and currency formats can be used in a variety of contexts and
adjacent to a variety of text. The bidi isolate characters were designed
*precisely* to address this kind of need, without forcing people to jump
The *only* question we have is whether the major platforms/systems that use
CLDR are all up to speed in terms of supporting the "new" (2013) characters
in their BIDI algorithms:
U+2066 LEFT-TO-RIGHT ISOLATE
U+2067 RIGHT-TO-LEFT ISOLATE
U+2068 FIRST STRONG ISOLATE
U+2069 POP DIRECTIONAL ISOLATE
Of course, anyone who is using the number formats in a richer format (like
HTML) is free to remap characters to markup when processing. That's their
On Fri, Apr 29, 2016 at 6:59 AM, Asmus Freytag (c) <asmusf at ix.netcom.com>
> On 4/28/2016 9:37 PM, Steven Loomis wrote:
>> Given the correct choice of internal format for the database,
>> The internal format is a Unicode String, specifically, UTF-8.
> That covers a lot of ground.
>> Given that CLDR data should be specifying the desired appearance
>> But CLDR is text, specifically, XML, and not glyphs…
> Sorry, I meant that CLDR should be specified in a way that the user
> expected "visual ordering" can be determined., not "appearance" as in
> Just to sidestep a potential misunderstanding: I'm not suggesting that the
> format be in visual order. Just that there are some assumptions made about
> the context in which the Unicode string (when bidi processed) will result
> in the correct visual appearance.
> For example, if you assume that a string as stored displays correct when
> it is part of a RTL paragraph, then you should be able to compute what you
> need to do to get the correct visual order when the text is part of an LTR
> paragraph, part of an isolated embedding, etc.
> I haven't looked into the actualities, but I know that while you can
> convert uniquely between some formats in a given direction, there are some
> conversions (or directions) that are not unique. So the challenge would be
> for the database to find some format that allows conversions to all the
> bidi contexts (and capabilities) that are typically encountered.
> Storing things in visual order is a bad idea, because in the general case,
> conversion to logical order is not unique.
> But, instead of picking some "random" logical order (based on an
> assumption of what "might" be most needed) my suggestion is to carefully
> pick a "universal" format for the string, one that allows mechanical
> conversion to all the actual formats that people need, based on what
> environment they want to embed their strings into, and what sorts of
> embedding / isolation controls are actually supported.
>> El 4/28/16 7:30 PM, "CLDR-Users en nombre de Asmus Freytag (c)" <
>> cldr-users-bounces at unicode.org en nombre de asmusf at ix.netcom.com>
>> On 4/28/2016 3:44 PM, Peter Edberg wrote:
>>>> Dear CLDR users,
>>> I think this is where a "one size fits all" solution isn't the answer.
>>> Ideally, I'll be able to use CLDR (and formatting tools depending on it)
>>> to format date/time/number strings for a variety of consumers.
>>> Plain text (pre 6.3), Plain text with isolates support, and plain text
>>> for embedding into markup (where I'll supply external markup to isolate
>>> and otherwise prep the field).
>>> Given that CLDR data should be specifying the desired appearance (not
>>> the bidi controls necessary to get to that) it should be possible to
>>> provide mechanical conversion between these formats, rather than having
>>> to make a single choice for the data base.
>>> Not only will "pre 6.3" support be an issue for a long time to come, I
>>> am confidently predicting that the need for multiple bidi flavors will
>>> continue beyond the adoption of the isolates. Whether a string is part
>>> of an (arbitrary) plain text stream or a separate data field (with its
>>> scope determined by markup and with it's own bidi styling) will continue
>>> to call for somewhat different data.
>>> Given the correct choice of internal format for the database, it should
>>> be possible to provide all of these flavors mechanically, thus avoiding
>>> the full cost of duplication, while freeing users from having to make
>>> those format translations themselves.
>>> CLDR-Users mailing list
>>> CLDR-Users at unicode.org
>> CLDR-Users mailing list
>> CLDR-Users at unicode.org
> CLDR-Users mailing list
> CLDR-Users at unicode.org
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the CLDR-Users