CLDR TL;DR article

Jukka K. Korpela jkorpela at cs.tut.fi
Wed Dec 24 06:49:21 CST 2014


2014-12-24, 13:55, Philippe Verdy wrote, commenting on an announcement 
of http://perladvent.org/2014/2014-12-23.html :

> That article about the Locale::CLDR gives an example of bad usage with:
>
>   *
>
>     fr: «foo», «bar» et «baz»

I agree, but I think it’s a more serious mistake to have

ur: ”foo“، ”bar“، اور ”baz“

As far as I know, Urdu is written right to left, so the order of words 
is all wrong.

> In this case the quotations marks are not enough in French, there MUST
> also be some non-breaking whitespace (preferably the thin non-breaking
> space) after the opening quotation mark, and before the closing mark.

This is a longstanding issue with no clear solution so far. In plain 
text, you can choose between SPACE, NO-BREAK SPACE, one of the 
“fixed-width” spaces like THIN SPACE, and the NARROW NO-BREAK SPACE. The 
“fixed-width” spaces (which largely aren’t fixed-width in reality) are 
by definition compatibility equivalent to SPACE, with its line breaking 
behavior. The NARROW NO-BREAK SPACE would seem ideal, but it has really 
been designed for a different purposes and there is no reason to expect 
that its width corresponds to that of espace fine insécable in French 
typography; moreover, its availability in fonts is limited, and it may 
still cause a symbol of undisplayable character to appear—surely worse 
than a space of any width, or no space.

In rich text, there are many things you can do to control the width and 
the line breaking behavior.

> Unfortunately the CLDR data only accepts 1 character for these marks
> when we should expect to find also the THINSP character

Is it so? In any case, a more fundamental problem is what string you 
would put there. It should indicate spacing, but considerably less than 
a normal space, and it should be non-breakable. I would make 
non-breakability the main concern, and between no spacing and a full 
space, I’d prefer no spacing. But…

The pages of l’Académie française use SPACE, so I guess it cannot be an 
all wrong approach, even though it looks rather strange to me

> (Note that on systems that cannot accept THINSP for French, the fallback
> can be NBSP, or a standard SPACE, but NEVER the absence of whitespace
> like in English).

Is that a rule that has officially been declared somewhere? When 
reading, say,
« An deux mil » ou « an deux mille » ?
on the Academy pages, I find it confusing that it looks like “ou” were a 
quoted string, and I would really prefer
«An deux mil» ou «an deux mille» ?
for reasons of clarity and typography. A punctuation mark isolated by 
full spaces looks so lonely, though at the end of sentence, it might be 
acceptable.

On the other hand, on a page that summarizes CLDR principles, I think 
the example should reflect what CLDR actually has, rather than what it 
should have. Although a note could be made about spacing issues in 
French, I think the only mistake in this area on the page is the wrong 
writing direction for Urdu—it might even be construed as claiming that 
CLDR suggests or requires such directionality!

Yucca



More information about the CLDR-Users mailing list