A sign/abbreviation for "magister"

Marcel Schneider via Unicode unicode at unicode.org
Tue Oct 30 10:52:47 CDT 2018


Rather than a dozen individual e-mails, I’m sending this omnibus reply 
for the record, because even if here and in CLDR (SurveyTool forum and 
Trac) everything has already been discussed and fixed, there is still 
a need to stay acknowledging, so as not to fail following up, with 
respect to the oncoming surveys, next of which is to start in 30 days.

First here: On 29/10/2018 at 12:43, Dr Freytag via Unicode wrote:

[…]
> The use of superscript is tricky, because it can be optional in some
> contexts; if I write "3rd" in English, it will definitely be
> understood no different from "3rd". 

[Note that this second instance was actually intended to read "3ʳᵈ", 
but it was formatted using a higher-level protocol.]

[…]
> In TeX the two transition fluidly. If I was going to transcribe such
> texts in TeX, I would construct a macro […]
[…]
> Nevertheless, I think the use of devices like combining underlines
> and superscript letters in plain text are best avoided.

While most other scripts from Arabic to Duployan are generously granted 
all and everything they need for accurate representation, starting with 
preformatted superscripts and ending with superscripting or subscripting 
format controls, Latin script is often quite deliberately pulled down 
in order to make it unusable outside high-end DTP software, from 
TeX to Adobe InDesign, with the notable exception of sparsely and 
parsimoniously encoded preformatted characters for phoneticists and 
medievalists. E.g. in Arabic script, superscript is considered worth 
encoding and using without any caveat, whereas when Latin script is on, 
superscripts are thrown into the same cauldron as underscoring.

Obviously Unicode don’t apply to Latin script the same principle they 
do to all other scripts, i.e. to free preformatted letters as suitable 
if they are part of a standard representation and in some cases are 
needed to ensure unambiguity. Mediterranean locales had preformatted 
ordinal indicators even in the Latin-1-only era, despite "1a" and "2o" 
may be understood no different from "1ª" and 2º". The degree sign, that 
is on French keyboards, is systematically hijacked to represent the 
"n°" abbreviation, unless a string is limited to ASCII-only. Several 
Latin-script-using locales have standard representations and strong 
user demands for superscripts, which instead of being satisfied on 
Unicode level as would be done for any other of the world’s scripts, 
are obstinately rebuffed when not intended for phonetics, or in 
some cases, for palaeography.

I wasn’t digging down to find out about those UTC members who on a 
regular basis are aggressively contradicting ballot comments about 
encoding palaeographic Latin letters, while proving unable to sustain 
any open and honest discussion on this List or elsewhere. Referring to 
what Dr Everson via Unicode wrote on 28/10/2018 at 21:49:

> I like palaeographic renderings of text very much indeed, and in fact
> remain in conflict with members of the UTC (who still, alas, do NOT
> communicate directly about such matters, but only in duelling ballot
> comments) about some actually salient representations required for
> medievalist use.


That said: On 29/10/2018 at 09:09, James Kass via Unicode wrote:
[…]
> If I were entering plain text data from an old post card, I'd try
> to keep the data as close to the source as possible. Because that
> would be my purpose. Others might have different purposes. 
> As you state, it depends on the intention. But, if there were an
> existing plain text convention I'd be inclined to use it. 
> Conventions allow for the possibility of interchange, direct
> encoding would ensure it.

The goal of discouraging Latin superscripts is obviously to ensure 
that reliable document interchange is limited to the PDF. 

If Unicode were allowed to emit an official recommendation to use 
preformatted superscripts in Latin script, too, then font designers 
would implement comprehensive support of combining diacritics, and 
any plain text including superscripted abbreviations could use the 
preformatted characters, in order to gather the interoperability 
that Unicode was designed for. Referring to what Dr Verdy via Unicode 
wrote on 28/10/2018 at 19:01:

[…]
> However it is still not very elegant if we stil need to use only
> the limited set of superscript letters (this still reduces the
> number of abbreviations, such as those commonly used in French
> that needs a superscript "é")

The use of combining diacritics with preformatted superscripts is 
also the reason why Unicode is limiting encoding support to base 
letters, even for preformatted superscript letters. The rule that 
no *new* precomposed letters with acute accent are encoded anymore 
applies to superscripts too. A Unicode-conformant way to represent 
such abbreviations would IMO use U+1D49 followed by U+0301: ,ᵉ́,.
Other representations may require OpenType support, which in Latin 
script is often turned off, supposedly in order to shift to higher 
level protocols what Unicode makes available in plain text.
Referring to what Dr Kass wrote on 29/10/2018 at 01:05:

[…]
> "Mr͇" for display purposes may look as daft as "/italics/", but
> it captures the elements of the text of the original manuscript. 
> And it would allow preservation of abbreviations such as for 
> "constitutionalité" → "Ct͇é͇".

Using superscripts plus combining diacritics might be a way to 
address the limitations Dr Verdy mentioned on 30/10/2018 at 02:56:

[…]
> Obviously the Latin script should not use any kind of visual
> encoding, and even the superscript letters (initially introduced
> for something else, notably as distinct symbols for IPA) was not
> the correct path (it also has limitation because the superscript
> letters are quite limited; […]

But for font designers to implement combining diacritics for use 
with preformatted superscripts, Unicode needs to explicitly allow 
or recommend the use of preformatted superscripts in abbreviations.

This use case is different from the use case that led to submit 
the L2/18-206 proposal, cited by Dr Ewell on 29/10/2018 at 20:29:

[…]
> The abbreviation in the postcard, rendered in plain text, is "Mr".
> Bringing U+02B3 or U+036C into the discussion just fuels the
> recurring demands for every Latin letter (and eventually those
> in other scripts) to be duplicated in subscript and superscript,
> à la L2/18-206.

IMO this proposal implodes when considering that the preformatted 
characters are supposed to be inserted by the application rather 
than directly out of keyboard drivers. 

The document L2/18-206 seems to originate from the observation 
of poor fonts and rendering engines in low-end document editing 
software. As previously mentioned, the fix is already available 
using high-end DTP software. That is sustainable as long as no 
locales are impacted. What this thread is about is a digitally 
interoperable representation of actual languages. E.g. small caps 
is out of scope, given the postcard writer did not write the names 
in small caps, that in Latin script are merely a stylistic 
convention intended for scientific publication and so on — while 
Cyrillic script currently uses “small caps” to write in lowercase.

Cyrillic also uses the № sign, that is mapped to the second level 
on key E03 ("3" key) on the Russian and other Cyrillic keyboards.
Russian keyboard layout:
https://docs.microsoft.com/en-us/globalization/keyboards/kbdru.html
Bulgaran (phonetic traditional) keyboard layout:
https://docs.microsoft.com/en-us/globalization/keyboards/kbdbgph1.html

Perhaps the Numero sign is used in Cyrillic after it had been encoded 
for East Asian as Dr Wallace via Unicode hinted on 28/10/2018 at 21:20:

[…]
> AIUI, № was encoded as a compatibility character because it appears
> in some East Asian character sets

Still № is also encoded in ISO/IEC 8859-5, at 0xf0.

Further, Dr Whistler via Unicode stated on 30/10/2018 at 05:54:

[…]
> The mere fact that some visual aspect of graphic representation on a 
> page of paper can be implemented via a mechanical typewriter does not, 
> ipso facto, mean that particular feature is plain text. The fact that I 
> could also implement superscripting and subscripting on a mechanical 
> typewriter via turning the platen up and down half a line, also does not 
> make *those* aspects of text styling plain text. either.

The reverse is true, too: The fact that some language representation was 
performed by tweaking the typewriter didn’t tag that representation as not 
plain text. E.g. the LATIN CAPITAL LETTER C WITH CEDILLA couldn’t be typed 
by holding Shift and hitting "ç"—key E09, the "9" key—on a French keyboard. 
Nevertheless it is required for legibility when "ç" occurs at the start of 
a sentence or in all-caps. 
The workaround was to type a COMMA over LATIN CAPITAL LETTER C. 

Likewise, SUPERSCRIPT TWO was available on French (France) typewriters, 
and Belgian French ones had SUPERSCRIPT THREE, too. Also, again, the now 
MODIFIER LETTER SMALL O was and still is emulated using the DEGREE SIGN 
(on level 2 of key E11). The fact that other superscript letters needed 
turning the platen does not make them belong to rich text, today.

It’s as Dr Kass via Unicode put it on 30/10/2018 at 10:09 when replying 
to Dr Whistler via Unicode (above):

[…]
> If the typist didn't intend to put a superscript "r" on that page with a 
> double underline, the typist wouldn't have bothered with all that jive.
>
> It's about the importance one places on respecting authorial intent.
>
[…]
> […] Underscoring might be stripped without messing with the legibility,
> but so could tatweels and lots of other stuff. […]

If the intent of Unicode is to discriminate Arabic script vs Latin script, 
that would be worth mentioning in the Standard. 

Making claims about interoperability and about unambiguous representation 
of all of the world’s scripts, Unicode is expected to do so for Latin, too.

Dr Bień via Unicode wrote on 29/10/2018 at 06:40:

> > […] It's a matter of opinion, and opinions often differ.
> 
> Well said, but I make the claim stronger; it depends on the purpose of
> the encoding and intended applications.

Dr Everson via Unicode replied to Dr Karocki on 28/10/2018 at 22:55:
> 
> I think that it is the _superscription_ that indicates the fact that
> it is an abbreviation.

Hence Unicode is expected to fully support the use of plain text 
superscript for those locales using superscript as an abbreviation 
indicator, in the same role as other locales may use colon or period, 
a usage that Dr Dürst via Unicode mentioned on 29/10/2018 at 08:04 
responding to Dr Everson’s 05:42 (same day) e-mail:

[…]
> I think this may depend on actual writing practice. In German at least, 
> it is customary to have dots (periods) at the end of abbreviations, and 
> using any other symbol, or not using the dot, would be considered an error.

So should be, in some locales among which French, not using superscript. 
It’s just that the perception of a superscript-less abbreviation that 
normally uses superscript, is biased by the computer keyboard layouts 
actually still in use (but hopefully soon to be enhanced by more complete 
layouts).

Now is Unicode inspired by typewriting practice when designing the encoding 
of Latin script, unlike what is done for potentially all other scripts?

Dr Bradfield just added on 30/10/2018 at 14:21 something that I didn’t 
know when replying to Dr Ewell on 29/10/2018 at 21:27:

[…]
> The English abbreviation Mr was also frequently superscripted in the
> 15th-17th centuries, and that didn't mean anything special either - it
> was just part of a general convention of superscripting the final
> segment of abbreviations, probably inherited from manuscript practice.

So English dropped the superscript requirement for common abbreviations 
in the 17ᵗʰ or 18ᵗʰ century to keep it only for ordinals. Should Unicode 
now take example on English to pull down the representation of French?
Fortunately it does not, as the French ordinal indicators are now a part 
of CLDR, consistently with what the French national body intended when 
setting up again a design process of a locale-conformant keyboard.

The rest of superscript abbreviation letters should follow in CLDR 
when browsers will be using correct fonts for displaying the data.

We remember that The Unicode Standard explicitly specifies that the 
glyphs of all superscript or modifier letters of a script shall be equalized.
No ransom note effect is allowed in Unicode-conformant fonts (except for 
the purpose of artwork, as in Apple’s former San Francisco typeface).


Best regards,

Marcel



More information about the Unicode mailing list