Use of Unicode 6.3 bidi format chars in CLDR number formats?

Philippe Verdy verdy_p at wanadoo.fr
Fri Apr 29 03:01:33 CDT 2016


Also apperently Mozilla browsers still have issues with those characters
(as well as with the CSS "isolate", still not supported, except in recent
versions but only with "-moz-" prefixes).
On Android and Chrome, the "-webkit-" is no longer necessary in recent
versions for CSS, however I don't think many versions still support
isolates in characters and Bidi processing.

For now most softwares still recognize only embedding, overrides, and
1-control markers (and browers are still mapping "bdi" elements only as
embedding, not as isolates).

Inserting directly RLI/LRI/FSI or PDI will just produce ignored characters,
without even the minimum remppaing to the embedding style.

2016-04-29 9:56 GMT+02:00 Philippe Verdy <verdy_p at wanadoo.fr>:

> Yes but this is unnecessarily complex to edit in surveys, even if the XML
> or JSON exports are inserting these characters themselves, and even if
> libaries using the data may detect those characters (when they are properly
> paired, but not possible for RLM and LRM and overly complex for LRO/PDF and
> RLO/PDF) and replace them by markup or style (possible for LRI/PDI, RLI/PDI
> and FSI/PDI which is probably the best mapping in HTML for the "bdi"
> element without dir="ltr/rtl").
> Do you expect that the survey will allow entering those controls easily?
> Can't there be helpers ?
>
> 2016-04-29 9:24 GMT+02:00 Mark Davis ☕️ <mark at macchiato.com>:
>
>> The number and currency formats can be used in a variety of contexts and
>> adjacent to a variety of text. The bidi isolate characters were designed
>> *precisely* to address this kind of need, without forcing people to jump
>> through hoops.
>>
>> The *only* question we have is whether the major platforms/systems that
>> use CLDR are all up to speed in terms of supporting the "new" (2013)
>> characters in their BIDI algorithms:
>>
>>    U+2066 LEFT-TO-RIGHT ISOLATE
>>   U+2067 RIGHT-TO-LEFT ISOLATE
>>   U+2068 FIRST STRONG ISOLATE
>>   U+2069 POP DIRECTIONAL ISOLATE
>>
>> Of course, anyone who is using the number formats in a richer format
>> (like HTML) is free to remap characters to markup when processing. That's
>> their choice.
>>
>> Mark
>>
>> Mark
>>
>> On Fri, Apr 29, 2016 at 6:59 AM, Asmus Freytag (c) <asmusf at ix.netcom.com>
>> wrote:
>>
>>> On 4/28/2016 9:37 PM, Steven Loomis wrote:
>>>
>>>> Asmus:
>>>>
>>>> Given the correct choice of internal format for the database,
>>>>>
>>>>
>>>> The internal format is a Unicode String, specifically, UTF-8.
>>>>
>>> That covers a lot of ground.
>>>
>>>>
>>>> Given that CLDR data should be specifying the desired appearance
>>>>>
>>>> But CLDR is text, specifically, XML, and not glyphs…
>>>>
>>>
>>> Sorry, I meant that CLDR should be specified in a way that the user
>>> expected "visual ordering" can be determined., not "appearance" as in
>>> "glyphs".
>>>
>>> Just to sidestep a potential misunderstanding: I'm not suggesting that
>>> the format be in visual order. Just that there are some assumptions made
>>> about the context in which the Unicode string (when bidi processed) will
>>> result in the correct visual appearance.
>>>
>>> For example, if you assume that a string as stored displays correct when
>>> it is part of a RTL paragraph, then you should be able to compute what you
>>> need to do to get the correct visual order when the text is part of an LTR
>>> paragraph, part of an isolated embedding, etc.
>>>
>>> I haven't looked into the actualities, but I know that while you can
>>> convert uniquely between some formats in a given direction, there are some
>>> conversions (or directions) that are not unique. So the challenge would be
>>> for the database to find some format that allows conversions to all the
>>> bidi contexts (and capabilities) that are typically encountered.
>>>
>>> Storing things in visual order is a bad idea, because in the general
>>> case, conversion to logical order is not unique.
>>>
>>> But, instead of picking some "random" logical order (based on an
>>> assumption of what "might" be most needed) my suggestion is to carefully
>>> pick a "universal" format for the string, one that allows mechanical
>>> conversion to all the actual formats that people need, based on what
>>> environment they want to embed their strings into, and what sorts of
>>> embedding / isolation controls are actually supported.
>>>
>>> A./
>>>
>>>
>>>
>>>> Steven
>>>>
>>>> El 4/28/16 7:30 PM, "CLDR-Users en nombre de Asmus Freytag (c)" <
>>>> cldr-users-bounces at unicode.org en nombre de asmusf at ix.netcom.com>
>>>> escribió:
>>>>
>>>> On 4/28/2016 3:44 PM, Peter Edberg wrote:
>>>>>
>>>>>> Dear CLDR users,
>>>>>>
>>>>> Peter,
>>>>>
>>>>> I think this is where a "one size fits all" solution isn't the answer.
>>>>>
>>>>> Ideally, I'll be able to use CLDR (and formatting tools depending on
>>>>> it)
>>>>> to format date/time/number strings for a variety of consumers.
>>>>>
>>>>> Plain text (pre 6.3), Plain text with isolates support, and plain text
>>>>> for embedding into markup (where I'll supply external markup to isolate
>>>>> and otherwise prep the field).
>>>>>
>>>>> Given that CLDR data should be specifying the desired appearance (not
>>>>> the bidi controls necessary to get to that) it should be possible to
>>>>> provide mechanical conversion between these formats, rather than having
>>>>> to make a single choice for the data base.
>>>>>
>>>>> Not only will "pre 6.3" support be an issue for a long time to come, I
>>>>> am confidently predicting that the need for multiple bidi flavors will
>>>>> continue beyond the adoption of the isolates. Whether a string is part
>>>>> of an (arbitrary) plain text stream or a separate data field (with its
>>>>> scope determined by markup and with it's own bidi styling) will
>>>>> continue
>>>>> to call for somewhat different data.
>>>>>
>>>>> Given the correct choice of internal format for the database, it should
>>>>> be possible to provide all of these flavors mechanically, thus avoiding
>>>>> the full cost of duplication, while freeing users from having to make
>>>>> those format translations themselves.
>>>>>
>>>>> A./
>>>>> _______________________________________________
>>>>> CLDR-Users mailing list
>>>>> CLDR-Users at unicode.org
>>>>> http://unicode.org/mailman/listinfo/cldr-users
>>>>>
>>>>
>>>> _______________________________________________
>>>> CLDR-Users mailing list
>>>> CLDR-Users at unicode.org
>>>> http://unicode.org/mailman/listinfo/cldr-users
>>>>
>>>
>>>
>>> _______________________________________________
>>> CLDR-Users mailing list
>>> CLDR-Users at unicode.org
>>> http://unicode.org/mailman/listinfo/cldr-users
>>>
>>
>>
>> _______________________________________________
>> CLDR-Users mailing list
>> CLDR-Users at unicode.org
>> http://unicode.org/mailman/listinfo/cldr-users
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/cldr-users/attachments/20160429/c8b8a755/attachment.html>


More information about the CLDR-Users mailing list