A sign/abbreviation for "magister" (was: Re: second attempt)

William_J_G Overington via Unicode unicode at unicode.org
Wed Oct 31 10:45:21 CDT 2018


There was a proposal, in the Bytext Report by Bernard Miller many years ago to introduce arrow parentheses characters, eight of them.

They were stateful, one character to mean that effectively everything following is superscript until told otherwise, and one for everything following is no longer superscript until told otherwise.

There were also pairs for subscript, for the upper limit of an integral and for the the lower limit of an integral and those two latter pairs could also be used with the capital sigma sign used to express the summation of a mathematical series.

Now, I appreciate that the statefulness of those suggested characters may still rule them out for implementation in plain text yet maybe an arrow parenthesis or something like it could be encoded that is like a combining accent character but has the effect of making the one character that it follows be a superscript character, and another similar character for subscripts. That would mean that any Unicode character could be used as a superscript or a subscript in plain text. Maybe another two, or maybe another four, such characters could be added so as to allow the limits of integrals and summations to be expressed in plain text using such a method.

These new characters could have a visible glyph as a fallback display yet not be displayed at all if, as a result of glyph substitution for the two character sequence, a superscript or subscript version of the first character of the two character sequence were displayed. 

William Overington

Wednesday 31 October 2018


----Original message----
>From : unicode at unicode.org
Date : 2018/10/31 - 14:57 (GMTST)
To : unicode at unicode.org
Subject : Re: A sign/abbreviation for "magister" (was: Re: second attempt)

On 31/10/2018 at 11:21, Asmus Freytag via Unicode wrote:
>
> On 10/31/2018 2:38 AM, Julian Bradfield via Unicode wrote:
>
> > You could use the various hacks
> > you've discussed, with modifier letters; but that is not "encoding",
> > that is "abusing Unicode to do markup". At least, that's the view I
> > take!
>
> +1

There seems to be a widespread confusion about what is plain text, and what 
Unicode is for. From an US-QWERTY point of view, a current mental representation 
of plain text may be ASCII-only. UK-QWERTY (not extended) adds vowels with acute.
Unicode is granting to every language its plain text representation. If superscript
acts as abbreviation indicator in a given language, this is part of the plain text 
representation of that language. 

So far, so good. The core problem is now to determine whether superscript is 
mandatory, and baseline is fallback, or superscript is optional and decorative, 
and baseline is correct. That may be a matter of opinion, as has been suggested. 
However we know now a list of languages where superscript is mandatory, and 
baseline is fallback. Leaving English alone, these languages on themselves need 
the use of preformatted superscript letters being granted to them by the UTC.

Still in the beginning, when early Unicode set up the Standard, superscript
was ruled out of plain text, except when there was sort of a strong lobbying, 
like when Vietnamese precomposed letters were added. Phoneticists have a strong 
lobby, so they got some ranges of preformatted letters. To make sure nobody 
dare use them in running text elsewhere, all *new* superscript letters got names 
on a MODIFIER LETTER basis, while subscript letters got straightforward names 
having SUBSCRIPT in them. Additionally, strong caveats were published in TUS.

And the trick worked, as most of the time, one is now referring to the superscript 
letters using the “modifier letter” label that Unicode have decked them out with.

That is why, today, any discussion is at risk of being subject to strong biases 
when its result should allow some languages to use their traditional abbreviation 
indicators, in an already encoded and implemented form. Fortunately the front has 
begun to move, as CLDR TC have granted ordinal indicators to the French locale 
per v34. 

Ordinal indicators are one category of abbreviation indicators. Consistently, the
already-ISO/IEC-8859-1-and-now-Unicode ordinal indicators are used also in titles
like "Sª", "Nª Sª", as found in the navigation pane of:
http://turismosomontano.es/en/que-ver-que-hacer/lugares-con-historia/monumentos/iglesia-de-la-asuncion-peralta-de-alcofea

I’m not quite sure whether some people would still argue that that string isn’t 
understood differently from "Na Sa".

> In general, I have a certain sympathy for the position that there is no universal
> answer for the dividing line between plain and styled text; there are some texts
> where the conventional division of plain test and styling means that the plain
> text alone will become somewhat ambiguous.

That is why phonetics need preformatted super- and subscripts, and so do languages
relying on superscript as an abbreviation indicator.

> We know that for mathematics, a different dividing line meant that it is possible
> to create an (almost) plain text version of many (if not most) mathematical
> texts; the conventions of that field are widely shared -- supporting a case for
> allowing a standard encoding to support it.

Referring to Murray Sargent’s UnicodeMath, a Nearly Plain Text Encoding of Mathematics, 
https://www.unicode.org/notes/tn28/
is always a good point in this discussion. UnicodeMath uses the full range of 
superscript digits, because the range is full. It does not use superscript letters, 
because their range is not full. Hence if superscript digits had stopped at the 
legacy range "¹²³", only measurement units like the metric equivalents of sq ft and 
cb ft could be written with superscripts, and that is already allowed according to
TUS. I’m ignoring why superscript 1 was added to ISO/IEC 8859-1, though. Anyway, 
since phonetics need a full range of superscript and subscript digits, these were 
added to Unicode, and therefore are used in UnicodeMath.

Likewise, phonetics need a nearly-full range of superscript letters, so these were 
added to Unicode, and therefore are used in the digital representation of natural 
languages.

> However, it stops short of 100% support for edge cases, as does the ordinary
> plain text when used for "normal" texts. I think, on balance, that is OK.

That is not clear as long as “ordinary plain text” is not defined for the purpose 
of this discussion. Since I have superscript small letters on live keys, and the 
superscript "ᵉ" even doubled on the same level as the digits (that it is used to 
transform into ordinals for most of them), my French keyboard layout driver allows 
the OS to output ordinary plain text consisting of various signs including 
superscript small Latin letters. 

Now is Unicode making a difference between “plain text” and “ordinary plain text”?
There are various ways to “clean up” the UCS, first removing presentation forms, 
then historic letters, then mathematical symbols, then why not emoji, and somewhere 
in-between, phonetic letters, among which superscripts. The result would then be 
“ordinary plain text” — but to what purpose? Possibly so that all documents must be 
written up using TeX. Following that logic to its end would mean that composed 
letters should be removed, too, given they are accurately represented using escape 
sequences like "e\'" for "é".

> If there were another important notational convention, widely shared, 
> reasonably consistent and so on, then I see no principled objection to considering
> whether it should be supported (minus some edge cases) in its own form of
> plain text (with appropriate additional elements encoded).

I’m pleased to read that. Given the use of superscript in French is important, 
widely shared, and reasonably consistent, we need to know what it should be else. 
Certainly: supported by the local keyboard layout. Hopefully it will be, soon.

> The current case, transcribing a post-card to make the text searchable, for
> example, would fit the use case for ordinary plain text, with the warning against
> simulated effects of markup.

Triggering such a warning would need to first sort out whether a given representation 
is best encoded using plain text or using markup. If it’s plain text, then that is 
not simulating anything. The reverse is true: Markup simulates accurate plain text.
Searchability is ensured by equivalence classes. Google Search has most comprehensive 
equivalence classes, indexing even all mathematical preformatted Latin letters like 
plain ASCII.

> All other uses are better served by markup, whether
> SGML / XML style to capture identified features, or final-form rich text like PDF
> just preserving the appearance.

Agreed.

Best regards,

Marcel





More information about the Unicode mailing list