Numeric group separators and Bidi

Philippe Verdy via Unicode unicode at unicode.org
Tue Jul 9 15:43:06 CDT 2019


Well my first feeling was that U+202F should work all the time, but I found
cases where this is not always the case. So this must be bugs in those
renderers.

And using Bidi controls (LRI/BDI) is absolutely not an option. These
controls are only intended to be used in pure plain-text files that have no
other ways to specify the embedding, and whose content is entirely static
(no generated by templates that return data from unspecified locales to an
unspecified locale).

As well the option of localizing each item is not possible. That's why I
search a locale-neutral solution that is acceptable in all languages, and
does not give false interpretation on the actual values of numbers (which
can have different scales or precision, and with also optional data, not
always present in all items to render but added to the list, for example as
annotations that should still be as locale-neutral as possible).

So U+202F is supposed to the the solution, but I did not find any way to
properly present the decimal separator: it is only unambiguous as a decimal
separator (and not a group separator) if there's a group separator present
in the number (and this is not always true!) And there I'm stuck with the
dot or comma, with no appropriate symbol that would not be confusable (may
be the small vertical tick hanging from the baseline could replace both the
dot and the comma?).



Le mar. 9 juil. 2019 à 22:10, Egmont Koblinger <egmont at gmail.com> a écrit :

> Hi Philippe,
>
> What do you mean U+202F doesn't work fo you?
>
> Whereas the logical string "hebrew 123<space>456 hebrew" indeed shows
> the number incorrectly as "456 123", it's not the case with U+202F
> instead of space, then the number shows up as "123 456" as expected.
>
> I think you need to pick a character whose BiDi class is "Common
> Number Separator", see e.g.
> https://www.compart.com/en/unicode/bidiclass/CS for a list of such
> characters including U+00A0 no-break space and U+202F narrow no-break
> space. This suggests to me that U+202F is a correct choice if you need
> the look of a narrow space.
>
> Another possibility is to embed the number in a LRI...PDI block, as
> e.g. https://unicode.org/cldr/utility/bidic.jsp does with the "1–3%"
> fragment of its default example.
>
> cheers,
> egmont
>
> On Tue, Jul 9, 2019 at 9:01 PM Philippe Verdy via Unicode
> <unicode at unicode.org> wrote:
> >
> > Is there a narrow space usable as a numeric group separator, and that
> also has the same bidi property as digits (i.e. neutral outside the span of
> digits and separators, but inheriting the implied directionality of the
> previous digit) ?
> >
> > I can't find a way to use narrow spaces instead of punctuation signs
> (dot or comma) for example in Arabic/Hebrew, for example to present tabular
> numeric data in a really language-neutral way. In Arabic/Hebrew we need to
> use punctuations as group separators because spaces don't work (not even
> the narrow non-breaking space U+202F used in French and recommended in
> ISO), but then these punctuation separators are interpreted differently
> (notably between French and English where the interpretation dot and comma
> are swapped)
> >
> > Note that:
> > - the "figure space" is not suitable (as it has the same width as digits
> and is used as a "filler" in tabular data; but it also does not have the
> correct bidi behavior, as it does not have the same bidi properties as
> digits).
> > - the "thin space" is not suitable (it is breakable)
> > - the "narrow non-breaking space" U+202F (used in French and currently
> in ISO) is not suitable, or may be I'm wrong and its presence is still
> neutral between groups of digits where it inherits the properties of the
> previous digit, but still does not enforces the bidi direction of the whole
> span of digits.
> >
> > Can you point me if U+202F is really suitable ? I made some tests with
> various text renderers, and some of them "break" the group of digits by
> reordering these groups, changing completely the rendered value (units
> become thousands or more, and thousands become units...). But may be these
> are bugs in renderers.
> >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20190709/6d49d7bc/attachment.html>


More information about the Unicode mailing list