Use of Unicode 6.3 bidi format chars in CLDR number formats?
Asmus Freytag (c)
asmusf at ix.netcom.com
Thu Apr 28 23:59:33 CDT 2016
On 4/28/2016 9:37 PM, Steven Loomis wrote:
> Asmus:
>
>> Given the correct choice of internal format for the database,
>
> The internal format is a Unicode String, specifically, UTF-8.
That covers a lot of ground.
>
>> Given that CLDR data should be specifying the desired appearance
> But CLDR is text, specifically, XML, and not glyphs…
Sorry, I meant that CLDR should be specified in a way that the user
expected "visual ordering" can be determined., not "appearance" as in
"glyphs".
Just to sidestep a potential misunderstanding: I'm not suggesting that
the format be in visual order. Just that there are some assumptions made
about the context in which the Unicode string (when bidi processed) will
result in the correct visual appearance.
For example, if you assume that a string as stored displays correct when
it is part of a RTL paragraph, then you should be able to compute what
you need to do to get the correct visual order when the text is part of
an LTR paragraph, part of an isolated embedding, etc.
I haven't looked into the actualities, but I know that while you can
convert uniquely between some formats in a given direction, there are
some conversions (or directions) that are not unique. So the challenge
would be for the database to find some format that allows conversions to
all the bidi contexts (and capabilities) that are typically encountered.
Storing things in visual order is a bad idea, because in the general
case, conversion to logical order is not unique.
But, instead of picking some "random" logical order (based on an
assumption of what "might" be most needed) my suggestion is to carefully
pick a "universal" format for the string, one that allows mechanical
conversion to all the actual formats that people need, based on what
environment they want to embed their strings into, and what sorts of
embedding / isolation controls are actually supported.
A./
>
> Steven
>
> El 4/28/16 7:30 PM, "CLDR-Users en nombre de Asmus Freytag (c)" <cldr-users-bounces at unicode.org en nombre de asmusf at ix.netcom.com> escribió:
>
>> On 4/28/2016 3:44 PM, Peter Edberg wrote:
>>> Dear CLDR users,
>> Peter,
>>
>> I think this is where a "one size fits all" solution isn't the answer.
>>
>> Ideally, I'll be able to use CLDR (and formatting tools depending on it)
>> to format date/time/number strings for a variety of consumers.
>>
>> Plain text (pre 6.3), Plain text with isolates support, and plain text
>> for embedding into markup (where I'll supply external markup to isolate
>> and otherwise prep the field).
>>
>> Given that CLDR data should be specifying the desired appearance (not
>> the bidi controls necessary to get to that) it should be possible to
>> provide mechanical conversion between these formats, rather than having
>> to make a single choice for the data base.
>>
>> Not only will "pre 6.3" support be an issue for a long time to come, I
>> am confidently predicting that the need for multiple bidi flavors will
>> continue beyond the adoption of the isolates. Whether a string is part
>> of an (arbitrary) plain text stream or a separate data field (with its
>> scope determined by markup and with it's own bidi styling) will continue
>> to call for somewhat different data.
>>
>> Given the correct choice of internal format for the database, it should
>> be possible to provide all of these flavors mechanically, thus avoiding
>> the full cost of duplication, while freeing users from having to make
>> those format translations themselves.
>>
>> A./
>> _______________________________________________
>> CLDR-Users mailing list
>> CLDR-Users at unicode.org
>> http://unicode.org/mailman/listinfo/cldr-users
>
> _______________________________________________
> CLDR-Users mailing list
> CLDR-Users at unicode.org
> http://unicode.org/mailman/listinfo/cldr-users
More information about the CLDR-Users
mailing list