Use of Unicode 6.3 bidi format chars in CLDR number formats?

Roozbeh Pournader roozbeh at unicode.org
Tue May 3 15:38:19 CDT 2016


The major barrier seems to be Java's Character#getDirectionality(). Apps on
Android and other platforms tend to use Java APIs for processing strings,
and it seems that Java still doesn't support the types associated with
these, so weird things will happen. There are probably other issues with
java.text.Bidi too, but I haven't checked.

On Fri, Apr 29, 2016 at 12:24 AM, Mark Davis ☕️ <mark at macchiato.com> wrote:

> The number and currency formats can be used in a variety of contexts and
> adjacent to a variety of text. The bidi isolate characters were designed
> *precisely* to address this kind of need, without forcing people to jump
> through hoops.
>
> The *only* question we have is whether the major platforms/systems that
> use CLDR are all up to speed in terms of supporting the "new" (2013)
> characters in their BIDI algorithms:
>
>    U+2066 LEFT-TO-RIGHT ISOLATE
>   U+2067 RIGHT-TO-LEFT ISOLATE
>   U+2068 FIRST STRONG ISOLATE
>   U+2069 POP DIRECTIONAL ISOLATE
>
> Of course, anyone who is using the number formats in a richer format (like
> HTML) is free to remap characters to markup when processing. That's their
> choice.
>
> Mark
>
> Mark
>
> On Fri, Apr 29, 2016 at 6:59 AM, Asmus Freytag (c) <asmusf at ix.netcom.com>
> wrote:
>
>> On 4/28/2016 9:37 PM, Steven Loomis wrote:
>>
>>> Asmus:
>>>
>>> Given the correct choice of internal format for the database,
>>>>
>>>
>>> The internal format is a Unicode String, specifically, UTF-8.
>>>
>> That covers a lot of ground.
>>
>>>
>>> Given that CLDR data should be specifying the desired appearance
>>>>
>>> But CLDR is text, specifically, XML, and not glyphs…
>>>
>>
>> Sorry, I meant that CLDR should be specified in a way that the user
>> expected "visual ordering" can be determined., not "appearance" as in
>> "glyphs".
>>
>> Just to sidestep a potential misunderstanding: I'm not suggesting that
>> the format be in visual order. Just that there are some assumptions made
>> about the context in which the Unicode string (when bidi processed) will
>> result in the correct visual appearance.
>>
>> For example, if you assume that a string as stored displays correct when
>> it is part of a RTL paragraph, then you should be able to compute what you
>> need to do to get the correct visual order when the text is part of an LTR
>> paragraph, part of an isolated embedding, etc.
>>
>> I haven't looked into the actualities, but I know that while you can
>> convert uniquely between some formats in a given direction, there are some
>> conversions (or directions) that are not unique. So the challenge would be
>> for the database to find some format that allows conversions to all the
>> bidi contexts (and capabilities) that are typically encountered.
>>
>> Storing things in visual order is a bad idea, because in the general
>> case, conversion to logical order is not unique.
>>
>> But, instead of picking some "random" logical order (based on an
>> assumption of what "might" be most needed) my suggestion is to carefully
>> pick a "universal" format for the string, one that allows mechanical
>> conversion to all the actual formats that people need, based on what
>> environment they want to embed their strings into, and what sorts of
>> embedding / isolation controls are actually supported.
>>
>> A./
>>
>>
>>
>>> Steven
>>>
>>> El 4/28/16 7:30 PM, "CLDR-Users en nombre de Asmus Freytag (c)" <
>>> cldr-users-bounces at unicode.org en nombre de asmusf at ix.netcom.com>
>>> escribió:
>>>
>>> On 4/28/2016 3:44 PM, Peter Edberg wrote:
>>>>
>>>>> Dear CLDR users,
>>>>>
>>>> Peter,
>>>>
>>>> I think this is where a "one size fits all" solution isn't the answer.
>>>>
>>>> Ideally, I'll be able to use CLDR (and formatting tools depending on it)
>>>> to format date/time/number strings for a variety of consumers.
>>>>
>>>> Plain text (pre 6.3), Plain text with isolates support, and plain text
>>>> for embedding into markup (where I'll supply external markup to isolate
>>>> and otherwise prep the field).
>>>>
>>>> Given that CLDR data should be specifying the desired appearance (not
>>>> the bidi controls necessary to get to that) it should be possible to
>>>> provide mechanical conversion between these formats, rather than having
>>>> to make a single choice for the data base.
>>>>
>>>> Not only will "pre 6.3" support be an issue for a long time to come, I
>>>> am confidently predicting that the need for multiple bidi flavors will
>>>> continue beyond the adoption of the isolates. Whether a string is part
>>>> of an (arbitrary) plain text stream or a separate data field (with its
>>>> scope determined by markup and with it's own bidi styling) will continue
>>>> to call for somewhat different data.
>>>>
>>>> Given the correct choice of internal format for the database, it should
>>>> be possible to provide all of these flavors mechanically, thus avoiding
>>>> the full cost of duplication, while freeing users from having to make
>>>> those format translations themselves.
>>>>
>>>> A./
>>>> _______________________________________________
>>>> CLDR-Users mailing list
>>>> CLDR-Users at unicode.org
>>>> http://unicode.org/mailman/listinfo/cldr-users
>>>>
>>>
>>> _______________________________________________
>>> CLDR-Users mailing list
>>> CLDR-Users at unicode.org
>>> http://unicode.org/mailman/listinfo/cldr-users
>>>
>>
>>
>> _______________________________________________
>> CLDR-Users mailing list
>> CLDR-Users at unicode.org
>> http://unicode.org/mailman/listinfo/cldr-users
>>
>
>
> _______________________________________________
> CLDR-Users mailing list
> CLDR-Users at unicode.org
> http://unicode.org/mailman/listinfo/cldr-users
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/cldr-users/attachments/20160503/9596d19c/attachment-0001.html>


More information about the CLDR-Users mailing list