Use of Unicode 6.3 bidi format chars in CLDR number formats?

Asmus Freytag (c) asmusf at ix.netcom.com
Thu Apr 28 23:59:33 CDT 2016


On 4/28/2016 9:37 PM, Steven Loomis wrote:
> Asmus:
>
>> Given the correct choice of internal format for the database,
>
> The internal format is a Unicode String, specifically, UTF-8.
That covers a lot of ground.
>
>> Given that CLDR data should be specifying the desired appearance
> But CLDR is text, specifically, XML, and not glyphs…

Sorry, I meant that CLDR should be specified in a way that the user 
expected "visual ordering" can be determined., not "appearance" as in 
"glyphs".

Just to sidestep a potential misunderstanding: I'm not suggesting that 
the format be in visual order. Just that there are some assumptions made 
about the context in which the Unicode string (when bidi processed) will 
result in the correct visual appearance.

For example, if you assume that a string as stored displays correct when 
it is part of a RTL paragraph, then you should be able to compute what 
you need to do to get the correct visual order when the text is part of 
an LTR paragraph, part of an isolated embedding, etc.

I haven't looked into the actualities, but I know that while you can 
convert uniquely between some formats in a given direction, there are 
some conversions (or directions) that are not unique. So the challenge 
would be for the database to find some format that allows conversions to 
all the bidi contexts (and capabilities) that are typically encountered.

Storing things in visual order is a bad idea, because in the general 
case, conversion to logical order is not unique.

But, instead of picking some "random" logical order (based on an 
assumption of what "might" be most needed) my suggestion is to carefully 
pick a "universal" format for the string, one that allows mechanical 
conversion to all the actual formats that people need, based on what 
environment they want to embed their strings into, and what sorts of 
embedding / isolation controls are actually supported.

A./

>
> Steven
>
> El 4/28/16 7:30 PM, "CLDR-Users en nombre de Asmus Freytag (c)" <cldr-users-bounces at unicode.org en nombre de asmusf at ix.netcom.com> escribió:
>
>> On 4/28/2016 3:44 PM, Peter Edberg wrote:
>>> Dear CLDR users,
>> Peter,
>>
>> I think this is where a "one size fits all" solution isn't the answer.
>>
>> Ideally, I'll be able to use CLDR (and formatting tools depending on it)
>> to format date/time/number strings for a variety of consumers.
>>
>> Plain text (pre 6.3), Plain text with isolates support, and plain text
>> for embedding into markup (where I'll supply external markup to isolate
>> and otherwise prep the field).
>>
>> Given that CLDR data should be specifying the desired appearance (not
>> the bidi controls necessary to get to that) it should be possible to
>> provide mechanical conversion between these formats, rather than having
>> to make a single choice for the data base.
>>
>> Not only will "pre 6.3" support be an issue for a long time to come, I
>> am confidently predicting that the need for multiple bidi flavors will
>> continue beyond the adoption of the isolates. Whether a string is part
>> of an (arbitrary) plain text stream or a separate data field (with its
>> scope determined by markup and with it's own bidi styling) will continue
>> to call for somewhat different data.
>>
>> Given the correct choice of internal format for the database, it should
>> be possible to provide all of these flavors mechanically, thus avoiding
>> the full cost of duplication, while freeing users from having to make
>> those format translations themselves.
>>
>> A./
>> _______________________________________________
>> CLDR-Users mailing list
>> CLDR-Users at unicode.org
>> http://unicode.org/mailman/listinfo/cldr-users
>
> _______________________________________________
> CLDR-Users mailing list
> CLDR-Users at unicode.org
> http://unicode.org/mailman/listinfo/cldr-users




More information about the CLDR-Users mailing list