French Superscript Abbreviations Fit Plain Text Requirements

Philippe Verdy verdy_p at wanadoo.fr
Thu Dec 29 02:35:54 CST 2016


I agree. Even for the abbreviation "N<sup>os</sup>" or "n<sup>os</sup>",
there's no ambiguity due to the grammar (in a sentence the abbreviation
would be preceded by an article ("les nos 2 et 3") or a noun ("les articles
nos 2 et 3) and followed by numerals and this cannot be analyzed like the
possessive "nos" which cannot appear after an article or noun.

If you want to represent only plaintext the typographic superscripts could
still be replaced by inserting an abbreviation dot ("les n.os 2 et 3) or by
not abbreviating it at all ("les numéros 2 et 3"). These superscript are
presentational only. The same applies to other abbreviations such as "Mgr"
("Monseigneur", which can be typeset as "M<sup>gr</sup>", "Bd"
("Boulevard", typeset as "B<sup>d</sup>"), "Mlle" ("Mademoiselle", typeset
as "M<sup>lle</sup>") and many, many abbreviations suffixing the last
letters. of a word that are preferably typeset using superscripts, but that
are still normal Latin letters, including letters with accents (notably "é"
which is frequent at end of French participles or nouns and which has no
encoded superscript variant).

Adding superscript variants (or other typographic variants) in Unicode for
that use would mean reencoding thousands letters in many scripts and in a
dozen of stylistic variants. This is not the way to go.

Plain text documents have their constraints, if clarity is needed they are
necessarily modified with additional text, but converting a rich text to
plain text and dropping all styles is destructive and may cause ambiguity
in some rare cases. But language semantics and grammar most often resolve
them to give sense to that text and abbreviations in plain text will still
be readable in most cases.

2016-12-28 22:47 GMT+01:00 Asmus Freytag <asmusf at ix.netcom.com>:

> On 12/28/2016 7:25 AM, Marcel Schneider wrote:
>
> Applied to the French abbreviation of “numéros” (numbers), that means that the
> abbreviationʼs final letters 'os' **must not** be formatted as superscript: Since
> “the extra information in rich text can always be stripped away to reveal the
> ‘pure’ text underneath” (TUS, ibid.), 'n^{os}' would end up as 'nos' (“our”,
> with a plural noun). Consequently, best practice is to represent it using the
> Unicode superscript “modifier letters”: 'nᵒˢ'.
>
> This is seriously overstating the plain text principle.
>
> There are many places where formatting affects the reading (and not just
> the presentation) of the text. In some cases, it is appropriate to encode
> characters for that, in other places the conclusion is simply that plain
> text is not sufficient.
>
> In English, superscript is used for ordinal numbers. The fallback without
> superscript tends to be functional, because of the alternation between
> digits and letters, but there's nothing "pure" about it.
>
> Some sentences in English can be parsed ambiguously; the convention in
> print has been to italicize the word intended to take the stress. Here, the
> plain-text fallback is less functional, as it re-introduces the ambiguity.
>
> There is no rule that says that *all* content information *must* be
> expressible on the plain text level. Some edge cases exist, where other
> layers, by necessity, participate.
>
> Mathematical notation is a good example of such a mixed case: while
> ordinary variables can be expressed in plain text with the help of
> mathematical alphabets, the proper display of formulas requires markup.
> Even Murray Sargent's plain text math is markup, albeit a very clever one
> that re-uses conventions used for the inline presentation of mathematical
> expression. (Where that is insufficient, it introduces additional
> conventions, clearly extraneous to the content, and hence markup).
>
> The encoding conventions (principles) chosen by Unicode stipulate that for
> ordinary text (not notations) any part of the content that requires
> alternate presentation (italics, superscript, etc) is to supplied via
> styles, not coded characters. In contrast, for scholarly or technical
> notation, that requirement is relaxed.
>
> As long as French is ordinary text, the abbreviations require styled
> (rich) text.
>
> A./
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20161229/4dc045a9/attachment.html>


More information about the Unicode mailing list