Use of Unicode 6.3 bidi format chars in CLDR number formats?

Mark Davis ☕️ mark at macchiato.com
Fri Apr 29 02:24:51 CDT 2016


The number and currency formats can be used in a variety of contexts and
adjacent to a variety of text. The bidi isolate characters were designed
*precisely* to address this kind of need, without forcing people to jump
through hoops.

The *only* question we have is whether the major platforms/systems that use
CLDR are all up to speed in terms of supporting the "new" (2013) characters
in their BIDI algorithms:

   U+2066 LEFT-TO-RIGHT ISOLATE
  U+2067 RIGHT-TO-LEFT ISOLATE
  U+2068 FIRST STRONG ISOLATE
  U+2069 POP DIRECTIONAL ISOLATE

Of course, anyone who is using the number formats in a richer format (like
HTML) is free to remap characters to markup when processing. That's their
choice.

Mark

Mark

On Fri, Apr 29, 2016 at 6:59 AM, Asmus Freytag (c) <asmusf at ix.netcom.com>
wrote:

> On 4/28/2016 9:37 PM, Steven Loomis wrote:
>
>> Asmus:
>>
>> Given the correct choice of internal format for the database,
>>>
>>
>> The internal format is a Unicode String, specifically, UTF-8.
>>
> That covers a lot of ground.
>
>>
>> Given that CLDR data should be specifying the desired appearance
>>>
>> But CLDR is text, specifically, XML, and not glyphs…
>>
>
> Sorry, I meant that CLDR should be specified in a way that the user
> expected "visual ordering" can be determined., not "appearance" as in
> "glyphs".
>
> Just to sidestep a potential misunderstanding: I'm not suggesting that the
> format be in visual order. Just that there are some assumptions made about
> the context in which the Unicode string (when bidi processed) will result
> in the correct visual appearance.
>
> For example, if you assume that a string as stored displays correct when
> it is part of a RTL paragraph, then you should be able to compute what you
> need to do to get the correct visual order when the text is part of an LTR
> paragraph, part of an isolated embedding, etc.
>
> I haven't looked into the actualities, but I know that while you can
> convert uniquely between some formats in a given direction, there are some
> conversions (or directions) that are not unique. So the challenge would be
> for the database to find some format that allows conversions to all the
> bidi contexts (and capabilities) that are typically encountered.
>
> Storing things in visual order is a bad idea, because in the general case,
> conversion to logical order is not unique.
>
> But, instead of picking some "random" logical order (based on an
> assumption of what "might" be most needed) my suggestion is to carefully
> pick a "universal" format for the string, one that allows mechanical
> conversion to all the actual formats that people need, based on what
> environment they want to embed their strings into, and what sorts of
> embedding / isolation controls are actually supported.
>
> A./
>
>
>
>> Steven
>>
>> El 4/28/16 7:30 PM, "CLDR-Users en nombre de Asmus Freytag (c)" <
>> cldr-users-bounces at unicode.org en nombre de asmusf at ix.netcom.com>
>> escribió:
>>
>> On 4/28/2016 3:44 PM, Peter Edberg wrote:
>>>
>>>> Dear CLDR users,
>>>>
>>> Peter,
>>>
>>> I think this is where a "one size fits all" solution isn't the answer.
>>>
>>> Ideally, I'll be able to use CLDR (and formatting tools depending on it)
>>> to format date/time/number strings for a variety of consumers.
>>>
>>> Plain text (pre 6.3), Plain text with isolates support, and plain text
>>> for embedding into markup (where I'll supply external markup to isolate
>>> and otherwise prep the field).
>>>
>>> Given that CLDR data should be specifying the desired appearance (not
>>> the bidi controls necessary to get to that) it should be possible to
>>> provide mechanical conversion between these formats, rather than having
>>> to make a single choice for the data base.
>>>
>>> Not only will "pre 6.3" support be an issue for a long time to come, I
>>> am confidently predicting that the need for multiple bidi flavors will
>>> continue beyond the adoption of the isolates. Whether a string is part
>>> of an (arbitrary) plain text stream or a separate data field (with its
>>> scope determined by markup and with it's own bidi styling) will continue
>>> to call for somewhat different data.
>>>
>>> Given the correct choice of internal format for the database, it should
>>> be possible to provide all of these flavors mechanically, thus avoiding
>>> the full cost of duplication, while freeing users from having to make
>>> those format translations themselves.
>>>
>>> A./
>>> _______________________________________________
>>> CLDR-Users mailing list
>>> CLDR-Users at unicode.org
>>> http://unicode.org/mailman/listinfo/cldr-users
>>>
>>
>> _______________________________________________
>> CLDR-Users mailing list
>> CLDR-Users at unicode.org
>> http://unicode.org/mailman/listinfo/cldr-users
>>
>
>
> _______________________________________________
> CLDR-Users mailing list
> CLDR-Users at unicode.org
> http://unicode.org/mailman/listinfo/cldr-users
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/cldr-users/attachments/20160429/2e5fd112/attachment.html>


More information about the CLDR-Users mailing list