A last missing link for interoperable representation

Marcel Schneider via Unicode unicode at unicode.org
Mon Jan 14 15:42:40 CST 2019


On 14/01/2019 08:26, Julian Bradfield via Unicode wrote:
> On 2019-01-13, Marcel Schneider via Unicode <unicode at unicode.org> wrote:
[…]
>> These statements make me fear that the font you are using might unsupport
>> the NARROW NO-BREAK SPACE U+202F > <. If you see a question mark between
> 
> It displays as a space. As one would expect - I use fixed width fonts
> for plain text.

It’s mainly that I suspected you could be using Courier New in the terminal.
It’s default for plain text in main browsers, and there are devices whose
copy of Courier New shows a .notdef box for U+202F. That’s at least what I
ɥnderstood from the feedback, and a test in my browser looked likewise.

> 
>> these pointy brackets, please let us know. Because then, You’re unable to
>> read interoperably usable French text, too, as you’ll see double punctuation
>> (eg "?!") where a single mark is intended, like here !
> 
> I see "like here !".

That’s fine, your font has support for <NNBSP>. Thanks for reporting.

The reason why I’m anxious to see that checked is that the impact on
implementations of <NNBSP> as the group separator is being assessed.

> French text does not need narrow spacing any more than science does.
> When doing typography, fifty centimetres is $50\thinspace\mathrm{cm}$;
> in plain text, 50cm does just fine.

By “plain text” you probably mean *draft style*. I’m thinking that
because "$50\thinspace\mathrm{cm}$" is not less plain text than "50cm".

Indeed, in not understanding that sooner I was an idiot, naively
believing that all Unicode List Members are using Unicode terminology.
Turns out that that cannot be taken for granted any more than knowing
the preferences of French people as of French text display, while not
being a Frenchman:

1. Most French people prefer that big punctunation be spaced off from
    the word it pertains to.

2. Most French people strongly dislike punctuation cut off by a line
    break, but cannot fix it because:
    a) the ordinary keyboard layout has no non-breaking spaces;
    b) the <NBSP> readily available on peculiar keyboard layouts
       is bugging in most e-mail composers, ending up as breakable.

3. A significant part of French people strongly dislike angle quotes
    that are spaced off too far, as it happens when using <NBSP>.

> Likewise, normal French people writing email write "Quel idiot!", or
> sometimes "Quel idiot !".

Normal people using normal keyboard layouts are writing with the
readily available characters most of the time. This is why (to pick
another example) French people abbreviate “numéro” to "n°", while
on a British English or an American English keyboard layout we
can’t normally expect anything else than "no", or "#" for “Number.”

We’re not trying to keep people off writing fast and draft style.
What in the Unicode era every locale is expected to achieve is to
enable normal users to get the accurate interoperable representation
of their language while typing fast, as opposed to coding in TeX,
which is like using InDesign with system spaces instead of Unicode.
System spaces are not interoperable, nor is LaTeX \thinspace if that
is non-breakable in LaTeX, which it obviously is, since it is used
to represent the thin space between a number and a measurement unit.

In Unicode, as we know it, U+2009 THIN SPACE is breakable, and the
worst thing here is that its duplicate encoding U+2008 PUNCTUATION
SPACE is breakable too, instead of being non-breakable like U+2007
FIGURE SPACE. That is why there was a need to add U+202F NARROW
NO-BREAK SPACE later. (More details in the cited CLDR ticket.)

> 
> If you google that phrase on a few French websites, you'll see that
> some (such as Larousse, whom one might expect to care about such
> things) use no space before punctuation,

Thanks for catching, that flaw shall be reported with link to
your email.

You may also wish to look up this page:
https://communaute.lerobert.com/forum/LE-ROBERT-CORRECTEUR/LE-ROBERT-CORRECTEUR-CORRECTION-D-ORTHOGRAPHE-DICTIONNAIRES-ET-GUIDES/Espace-entre-le-meotet-le-point-d-interrogation/2918628/398261

reading: “Le logiciel Le Robert correcteur justement signale les
espaces fines insécables si elles ne sont pas présentes sur le texte
et propose la correction.” (“Le Robert spellchecker does report
the lack of narrow no-break spaces and proposes to fix it.”)

> while others (such as some
> random T-shirt company) use an ASCII space.
> 
> The Académie Française, which by definition knows more about French
> orthography than you do, uses full ASCII spaces before ? and ! on its
> front page. Also after opening guillemets, which looks even more
> stupid from an Anglophone perspective.

(See point 3 above.) That is a very good point. Indeed this website is
reasonably expected to be an example and a template of correctly
typesetting a French website. There are several reasons why actually it
is not. The main reason is that it is not the work of the A.F. itself,
but of webdesigners, webmasters and content managers, who are normal
people like for any other website. They just haven’t got an appropriate
keyboard layout yet, and that is ultimately my fault because in the
nineties and later I didn’t care about computers and keyboard layouts.
That may sound crazy but it isn’t really. French is needing so a
peculiar keyboard layout to get its representation functional, useful
and interoperable without slowing down typists, that numerous
preconditions and time was needed to design it.

Among the preconditions, Unicode did not have the needed
non-breakable thin space when keyboarding was on in France.

French typesetters were aware of the thin space needed with big
punctuation marks (sometimes called tall or double punctuation).
The style manual of the Imprimerie Nationale is unambiguous, and
where it isn’t, its actual practice is to be followed. That leaves
only the colon not with <NNBSP> but with <NBSP>. I cannot post a
scan or photo of the table at page 149, nor of the examples as
they are typeset in the print book, because it’s copyrighted
material, but you’re welcome to purchase your copy if you didn’t
already. That guide is kind of quoted by the A.F. when it’s up to
determine whether capital letters should be diacriticized or not.

Philippe Verdy reported in 2015 on this List that in France,
the colon too is widely typeset with <NNBSP>, and that the
Imprimerie Nationale conforms to the specs of its clients.

> 
>> Aiming at extending the subset of environments supporting correct typesetting
> 
> There are many fine programs, including TeX, for doing good
> typesetting. Unicode is not about typesetting, it's about information
> exchange and preservation.

Yah and TeX is converting our code to Unicode, so that we have
several formats to choose from when considering exchange and
preservation.

The point in having an interoperable digital representation of
all natural languages is that normal people are not forced to
use draft style when just writing their language on a computer.

Best regards,

Marcel


More information about the Unicode mailing list