CLDR TL;DR article
Jukka K. Korpela
jkorpela at cs.tut.fi
Wed Dec 24 06:49:21 CST 2014
2014-12-24, 13:55, Philippe Verdy wrote, commenting on an announcement
of http://perladvent.org/2014/2014-12-23.html :
> That article about the Locale::CLDR gives an example of bad usage with:
>
> *
>
> fr: «foo», «bar» et «baz»
I agree, but I think it’s a more serious mistake to have
ur: ”foo“، ”bar“، اور ”baz“
As far as I know, Urdu is written right to left, so the order of words
is all wrong.
> In this case the quotations marks are not enough in French, there MUST
> also be some non-breaking whitespace (preferably the thin non-breaking
> space) after the opening quotation mark, and before the closing mark.
This is a longstanding issue with no clear solution so far. In plain
text, you can choose between SPACE, NO-BREAK SPACE, one of the
“fixed-width” spaces like THIN SPACE, and the NARROW NO-BREAK SPACE. The
“fixed-width” spaces (which largely aren’t fixed-width in reality) are
by definition compatibility equivalent to SPACE, with its line breaking
behavior. The NARROW NO-BREAK SPACE would seem ideal, but it has really
been designed for a different purposes and there is no reason to expect
that its width corresponds to that of espace fine insécable in French
typography; moreover, its availability in fonts is limited, and it may
still cause a symbol of undisplayable character to appear—surely worse
than a space of any width, or no space.
In rich text, there are many things you can do to control the width and
the line breaking behavior.
> Unfortunately the CLDR data only accepts 1 character for these marks
> when we should expect to find also the THINSP character
Is it so? In any case, a more fundamental problem is what string you
would put there. It should indicate spacing, but considerably less than
a normal space, and it should be non-breakable. I would make
non-breakability the main concern, and between no spacing and a full
space, I’d prefer no spacing. But…
The pages of l’Académie française use SPACE, so I guess it cannot be an
all wrong approach, even though it looks rather strange to me
> (Note that on systems that cannot accept THINSP for French, the fallback
> can be NBSP, or a standard SPACE, but NEVER the absence of whitespace
> like in English).
Is that a rule that has officially been declared somewhere? When
reading, say,
« An deux mil » ou « an deux mille » ?
on the Academy pages, I find it confusing that it looks like “ou” were a
quoted string, and I would really prefer
«An deux mil» ou «an deux mille» ?
for reasons of clarity and typography. A punctuation mark isolated by
full spaces looks so lonely, though at the end of sentence, it might be
acceptable.
On the other hand, on a page that summarizes CLDR principles, I think
the example should reflect what CLDR actually has, rather than what it
should have. Although a note could be made about spacing issues in
French, I think the only mistake in this area on the page is the wrong
writing direction for Urdu—it might even be construed as claiming that
CLDR suggests or requires such directionality!
Yucca
More information about the CLDR-Users
mailing list