From roozbeh at unicode.org Tue May 3 15:38:19 2016 From: roozbeh at unicode.org (Roozbeh Pournader) Date: Tue, 3 May 2016 13:38:19 -0700 Subject: Use of Unicode 6.3 bidi format chars in CLDR number formats? In-Reply-To: References: <91EC00B1-E0CC-48CF-B44C-D48C9BDC92FC@apple.com> <5c98059b-47c3-a62f-11a1-4370edc526e2@ix.netcom.com> Message-ID: The major barrier seems to be Java's Character#getDirectionality(). Apps on Android and other platforms tend to use Java APIs for processing strings, and it seems that Java still doesn't support the types associated with these, so weird things will happen. There are probably other issues with java.text.Bidi too, but I haven't checked. On Fri, Apr 29, 2016 at 12:24 AM, Mark Davis ?? wrote: > The number and currency formats can be used in a variety of contexts and > adjacent to a variety of text. The bidi isolate characters were designed > *precisely* to address this kind of need, without forcing people to jump > through hoops. > > The *only* question we have is whether the major platforms/systems that > use CLDR are all up to speed in terms of supporting the "new" (2013) > characters in their BIDI algorithms: > > U+2066 LEFT-TO-RIGHT ISOLATE > U+2067 RIGHT-TO-LEFT ISOLATE > U+2068 FIRST STRONG ISOLATE > U+2069 POP DIRECTIONAL ISOLATE > > Of course, anyone who is using the number formats in a richer format (like > HTML) is free to remap characters to markup when processing. That's their > choice. > > Mark > > Mark > > On Fri, Apr 29, 2016 at 6:59 AM, Asmus Freytag (c) > wrote: > >> On 4/28/2016 9:37 PM, Steven Loomis wrote: >> >>> Asmus: >>> >>> Given the correct choice of internal format for the database, >>>> >>> >>> The internal format is a Unicode String, specifically, UTF-8. >>> >> That covers a lot of ground. >> >>> >>> Given that CLDR data should be specifying the desired appearance >>>> >>> But CLDR is text, specifically, XML, and not glyphs? >>> >> >> Sorry, I meant that CLDR should be specified in a way that the user >> expected "visual ordering" can be determined., not "appearance" as in >> "glyphs". >> >> Just to sidestep a potential misunderstanding: I'm not suggesting that >> the format be in visual order. Just that there are some assumptions made >> about the context in which the Unicode string (when bidi processed) will >> result in the correct visual appearance. >> >> For example, if you assume that a string as stored displays correct when >> it is part of a RTL paragraph, then you should be able to compute what you >> need to do to get the correct visual order when the text is part of an LTR >> paragraph, part of an isolated embedding, etc. >> >> I haven't looked into the actualities, but I know that while you can >> convert uniquely between some formats in a given direction, there are some >> conversions (or directions) that are not unique. So the challenge would be >> for the database to find some format that allows conversions to all the >> bidi contexts (and capabilities) that are typically encountered. >> >> Storing things in visual order is a bad idea, because in the general >> case, conversion to logical order is not unique. >> >> But, instead of picking some "random" logical order (based on an >> assumption of what "might" be most needed) my suggestion is to carefully >> pick a "universal" format for the string, one that allows mechanical >> conversion to all the actual formats that people need, based on what >> environment they want to embed their strings into, and what sorts of >> embedding / isolation controls are actually supported. >> >> A./ >> >> >> >>> Steven >>> >>> El 4/28/16 7:30 PM, "CLDR-Users en nombre de Asmus Freytag (c)" < >>> cldr-users-bounces at unicode.org en nombre de asmusf at ix.netcom.com> >>> escribi?: >>> >>> On 4/28/2016 3:44 PM, Peter Edberg wrote: >>>> >>>>> Dear CLDR users, >>>>> >>>> Peter, >>>> >>>> I think this is where a "one size fits all" solution isn't the answer. >>>> >>>> Ideally, I'll be able to use CLDR (and formatting tools depending on it) >>>> to format date/time/number strings for a variety of consumers. >>>> >>>> Plain text (pre 6.3), Plain text with isolates support, and plain text >>>> for embedding into markup (where I'll supply external markup to isolate >>>> and otherwise prep the field). >>>> >>>> Given that CLDR data should be specifying the desired appearance (not >>>> the bidi controls necessary to get to that) it should be possible to >>>> provide mechanical conversion between these formats, rather than having >>>> to make a single choice for the data base. >>>> >>>> Not only will "pre 6.3" support be an issue for a long time to come, I >>>> am confidently predicting that the need for multiple bidi flavors will >>>> continue beyond the adoption of the isolates. Whether a string is part >>>> of an (arbitrary) plain text stream or a separate data field (with its >>>> scope determined by markup and with it's own bidi styling) will continue >>>> to call for somewhat different data. >>>> >>>> Given the correct choice of internal format for the database, it should >>>> be possible to provide all of these flavors mechanically, thus avoiding >>>> the full cost of duplication, while freeing users from having to make >>>> those format translations themselves. >>>> >>>> A./ >>>> _______________________________________________ >>>> CLDR-Users mailing list >>>> CLDR-Users at unicode.org >>>> http://unicode.org/mailman/listinfo/cldr-users >>>> >>> >>> _______________________________________________ >>> CLDR-Users mailing list >>> CLDR-Users at unicode.org >>> http://unicode.org/mailman/listinfo/cldr-users >>> >> >> >> _______________________________________________ >> CLDR-Users mailing list >> CLDR-Users at unicode.org >> http://unicode.org/mailman/listinfo/cldr-users >> > > > _______________________________________________ > CLDR-Users mailing list > CLDR-Users at unicode.org > http://unicode.org/mailman/listinfo/cldr-users > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From verdy_p at wanadoo.fr Wed May 4 06:38:30 2016 From: verdy_p at wanadoo.fr (Philippe Verdy) Date: Wed, 4 May 2016 13:38:30 +0200 Subject: Use of Unicode 6.3 bidi format chars in CLDR number formats? In-Reply-To: References: <91EC00B1-E0CC-48CF-B44C-D48C9BDC92FC@apple.com> <5c98059b-47c3-a62f-11a1-4370edc526e2@ix.netcom.com> Message-ID: Actually I don't see any documented property value for RLI/LRI and PDI in java Character class. May be it's DIRECTIONALITY_UNDEFINED and then those three characters have to be processed separately in this case. But I don't see why the three additional properties could not be defined (given that the enumeration type is in fact a Java "byte") and added to the static members of the Character class (except that this class is "final", meaning that we cannot derive an extended class from it). Java still added static properties and methods to this class (compared to older versions), for this cases only 3 static members are missing, the "Character.getDirectionality(char)" and "Character.getDirectionality(int)" methods would remain the same. This would require adding the three static members and optionaly updating the documentation for the values possibly returned by Character.getDirectionality(...). I don't think anyway it is a real blocker: the Bidi algorithm is not located in the Character class itself but implemented in other classes that can be updated even if the Character class is not extended with the three static fields. So the problem is not in that class but in the other libraries using it to implement the Bidi algorithm: Given that PDF is already supported, there's alsready the internal support for the needed stack of embedded states. Those libaries can just detect the three controls in the DIRECTIONALITY_UNDEFINED case, more or less without changing the rest based on other documented values. 2016-05-03 22:38 GMT+02:00 Roozbeh Pournader : > The major barrier seems to be Java's Character#getDirectionality(). Apps > on Android and other platforms tend to use Java APIs for processing > strings, and it seems that Java still doesn't support the types associated > with these, so weird things will happen. There are probably other issues > with java.text.Bidi too, but I haven't checked. > > On Fri, Apr 29, 2016 at 12:24 AM, Mark Davis ?? > wrote: > >> The number and currency formats can be used in a variety of contexts and >> adjacent to a variety of text. The bidi isolate characters were designed >> *precisely* to address this kind of need, without forcing people to jump >> through hoops. >> >> The *only* question we have is whether the major platforms/systems that >> use CLDR are all up to speed in terms of supporting the "new" (2013) >> characters in their BIDI algorithms: >> >> U+2066 LEFT-TO-RIGHT ISOLATE >> U+2067 RIGHT-TO-LEFT ISOLATE >> U+2068 FIRST STRONG ISOLATE >> U+2069 POP DIRECTIONAL ISOLATE >> >> Of course, anyone who is using the number formats in a richer format >> (like HTML) is free to remap characters to markup when processing. That's >> their choice. >> >> Mark >> >> Mark >> >> On Fri, Apr 29, 2016 at 6:59 AM, Asmus Freytag (c) >> wrote: >> >>> On 4/28/2016 9:37 PM, Steven Loomis wrote: >>> >>>> Asmus: >>>> >>>> Given the correct choice of internal format for the database, >>>>> >>>> >>>> The internal format is a Unicode String, specifically, UTF-8. >>>> >>> That covers a lot of ground. >>> >>>> >>>> Given that CLDR data should be specifying the desired appearance >>>>> >>>> But CLDR is text, specifically, XML, and not glyphs? >>>> >>> >>> Sorry, I meant that CLDR should be specified in a way that the user >>> expected "visual ordering" can be determined., not "appearance" as in >>> "glyphs". >>> >>> Just to sidestep a potential misunderstanding: I'm not suggesting that >>> the format be in visual order. Just that there are some assumptions made >>> about the context in which the Unicode string (when bidi processed) will >>> result in the correct visual appearance. >>> >>> For example, if you assume that a string as stored displays correct when >>> it is part of a RTL paragraph, then you should be able to compute what you >>> need to do to get the correct visual order when the text is part of an LTR >>> paragraph, part of an isolated embedding, etc. >>> >>> I haven't looked into the actualities, but I know that while you can >>> convert uniquely between some formats in a given direction, there are some >>> conversions (or directions) that are not unique. So the challenge would be >>> for the database to find some format that allows conversions to all the >>> bidi contexts (and capabilities) that are typically encountered. >>> >>> Storing things in visual order is a bad idea, because in the general >>> case, conversion to logical order is not unique. >>> >>> But, instead of picking some "random" logical order (based on an >>> assumption of what "might" be most needed) my suggestion is to carefully >>> pick a "universal" format for the string, one that allows mechanical >>> conversion to all the actual formats that people need, based on what >>> environment they want to embed their strings into, and what sorts of >>> embedding / isolation controls are actually supported. >>> >>> A./ >>> >>> >>> >>>> Steven >>>> >>>> El 4/28/16 7:30 PM, "CLDR-Users en nombre de Asmus Freytag (c)" < >>>> cldr-users-bounces at unicode.org en nombre de asmusf at ix.netcom.com> >>>> escribi?: >>>> >>>> On 4/28/2016 3:44 PM, Peter Edberg wrote: >>>>> >>>>>> Dear CLDR users, >>>>>> >>>>> Peter, >>>>> >>>>> I think this is where a "one size fits all" solution isn't the answer. >>>>> >>>>> Ideally, I'll be able to use CLDR (and formatting tools depending on >>>>> it) >>>>> to format date/time/number strings for a variety of consumers. >>>>> >>>>> Plain text (pre 6.3), Plain text with isolates support, and plain text >>>>> for embedding into markup (where I'll supply external markup to isolate >>>>> and otherwise prep the field). >>>>> >>>>> Given that CLDR data should be specifying the desired appearance (not >>>>> the bidi controls necessary to get to that) it should be possible to >>>>> provide mechanical conversion between these formats, rather than having >>>>> to make a single choice for the data base. >>>>> >>>>> Not only will "pre 6.3" support be an issue for a long time to come, I >>>>> am confidently predicting that the need for multiple bidi flavors will >>>>> continue beyond the adoption of the isolates. Whether a string is part >>>>> of an (arbitrary) plain text stream or a separate data field (with its >>>>> scope determined by markup and with it's own bidi styling) will >>>>> continue >>>>> to call for somewhat different data. >>>>> >>>>> Given the correct choice of internal format for the database, it should >>>>> be possible to provide all of these flavors mechanically, thus avoiding >>>>> the full cost of duplication, while freeing users from having to make >>>>> those format translations themselves. >>>>> >>>>> A./ >>>>> _______________________________________________ >>>>> CLDR-Users mailing list >>>>> CLDR-Users at unicode.org >>>>> http://unicode.org/mailman/listinfo/cldr-users >>>>> >>>> >>>> _______________________________________________ >>>> CLDR-Users mailing list >>>> CLDR-Users at unicode.org >>>> http://unicode.org/mailman/listinfo/cldr-users >>>> >>> >>> >>> _______________________________________________ >>> CLDR-Users mailing list >>> CLDR-Users at unicode.org >>> http://unicode.org/mailman/listinfo/cldr-users >>> >> >> >> _______________________________________________ >> CLDR-Users mailing list >> CLDR-Users at unicode.org >> http://unicode.org/mailman/listinfo/cldr-users >> >> > > _______________________________________________ > CLDR-Users mailing list > CLDR-Users at unicode.org > http://unicode.org/mailman/listinfo/cldr-users > > -------------- next part -------------- An HTML attachment was scrubbed... URL: