A sign/abbreviation for "magister"
Marcel Schneider via Unicode
unicode at unicode.org
Fri Nov 2 11:37:21 CDT 2018
On 31/10/2018 at 19:34, Asmus Freytag via Unicode wrote:
> On 10/31/2018 10:32 AM, Janusz S. Bień via Unicode wrote:
> > Let me remind what plain text is according to the Unicode glossary:
> > Computer-encoded text that consists only of a sequence of code
> > points from a given standard, with no other formatting or structural
> > information.
> > If you try to use this definition to decide what is and what is not a
> > character, you get vicious circle.
> > As mentioned already by others, there is no other generally accepted
> > definition of plain text.
Being among those who argued that the “plain text” concept cannot—and
therefore mustn’t—be used per se to disallow the use of a more or less
restricted or extended set of characters in what is called “ordinary text”,
I’m ending up adding the following in case it might be of interest:
> This definition becomes tautological only when you try to invoke it in making
> encoding decisions, that is, if you couple it with the statement that only
> "elements of plain text" are ever encoded.
I don’t think that Janusz S. Bień’s concern is about this definition
being “tautological”. AFAICS the Unicode definition of “plain text” is
quoted to back the assumption that it’s hard to use that concept to argue
against the use of a given Unicode character in a given context, or to
use it to kill a proposal for characters significant in natural languages.
The reasoning is that the call not to use character X in plain text, while X is
a legal Unicode character whose use is not discouraged for technical reasons,
is like if “ordinary people” (scarequoted derivative from “ordinary text”) were
told that X is not a Unicode character. That discourse is a “vicious circle” in
that there is no limit to it until Latin script is pulled down to plain ASCII.
As already well known, diacritics are handled by the rendering system and don’t
need to be displayed as such in the plain text backbone. I don’t believe that
the same applies to other scripts, but these are often not considered when the
encoding of Latin preformatted letters is fought, given superscripting seems
to be proper to Latin, and originated from longlasting medieval practice and
> For that purpose, you need a number of other definitions of "plain text".
> Including the definition that plain text is the "backbone" to which you apply
> formatting and layout information. I personally believer that there are more
> 2D notations where it's quite obvious to me that what is "placed" is a text
> element. More like maps and music and less like a circuit diagram, where the
> elements are less text like (I deliberately include symbols in the definition
> of text, but not any random graphical line art).
All two-dimensional notations here (outside the parenthetical) use higher-level
protocols; maps and diagrams are often vector graphics. But Unicode strived to
encode all needed plain text elements, such as symbols for maritime and wheather
maps. Even arrows of many possible shapes, including 3D-looking ones, have been
encoded. While freehand (rather than “any random”) graphical art is out of scope,
we have a lot of box drawing, used with appropriate fonts to draw e.g. layouts of
keyboards above the relevant source code in plain text files (examples in XKB).
As a sidenote: Box drawing while useful is unduly neglected on font level, even
in the Code Charts where the advance width, usually half an em, is inconsistent
between different sorts of elements belonging to the same block.
> Another definition of plain text is that which contains the "readable content"
> of the text.
As already discussed on this List, many documents in PDF have hard-to-read plain
text backbones, even misleading Google Search, for the purpose of handling special
glyphs (and, in some era, even special characters).
> As we've discussed here, this definition has edge cases; some
> content is traditionally left to styling.
Many pre-Unicode traditions are found out there, that stay in use, partly for
technical reasons (mainly by lack of updated keyboard layouts), partly for
consistency with accustomed ways of doing. Being traditionally-left-to-styling
is the more unconvincing. Even a letter that got to become LATIN SMALL LETTER O E
(Unicode 1.0) was composed on typewriters using the half-backspace, and should be
_left to styling_ when it was pulled out of the draft ISO/IEC 8859-1 by the fault
of a Frenchman (name undisclosed for privacy). And we’ve been told on this List
that the tradition using styling (a special font) to display the additional Latin
letters used to write Bambara survived.
> Example: some of the small words in
> some Scandinavian languages are routinely italicized to disambiguate their
Other languages use titlecase to achieve the same disambiguation. E.g. French
titlecases the noun "Une" which means the "cover", not the undefined article,
and German did the same when "Ein(e)" is a numeral, but today, other means,
including italics, are more common.
> Other languages use accents for this purpose - sometimes without
> recognizing either the accented letter as part of the alphabet, or the accented
> form as a dictionary entry.
Talking about Dutch stressing acute, discussed earlier on this List.
> Which nicely shows, that this level disambiguation
> is intuitively viewed as less orthographic, something that applies to the cases
> where italics are used for the same purpose.
Another Unicode-conformant means of noting stress would be adding an emoji. :-|
If stress is close to emotion, it could be represented in a similar way.
Strictly speaking, that is off-topic in this thread, that is about representing
abbreviations in a legible rather than merely decipherable way. In plain text.
If stress is not represented, you still can read the sentence without stumbling.
That is not always true when abbreviations are not superscripted. I remember an
ASCII-only environment localized in French, where "no centre mess" is "numéro du
centre de messagerie", "dial number of the message platform". Being unfamiliar, I
did stumble prior to completing and understanding the meaning: "nᵒ centre mess."
> In some contexts (Western Math) the scope of readable content is different than
> that of ordinary text. Therefore, this definition of "plain text" isn't universal.
> In principle, you could argue that your definition of readable content should apply;
> however, as a standard, Unicode will insist on limiting the encoding to text elements
> required by some common, widely shared and reasonably agreed-upon definition of
> plain text -- corresponding to a particular division between text elements and styling.
> So far, we have ordinary text, math and phonetics,
Thanks for clarification. Nevertheless, that partition of roles has something arbitrary
as long as abbreviation indicators are excluded from the scope of ordinary text.
That is, that policy is applied and promoted without being well designed. It implodes
from the beginning on, given the feminine and masculine ordinal indicators pre-dated
Unicode and are a living proof of the importance of preformatted superscripts.
Instead of drawing the borderline between usages only, Unicode draw it between natural
languages, stating that Italian, Portuguese and Spanish are entitled to use superscript
ordinal indicators, whereas on the other hand, English and French are not. In the same
vein, Italian, Portuguese and Spanish are granted the right of composing titles and
some other abbreviations using preformatted superscript letters, as long as the set
doesn’t exceed a and o, but other languages are not when using other or more letters,
or when not being accustomed to underlining as an additional abbreviation indicator.
Fortunately that is no longer true, so the point is actually to redact the relevant
paragraphs in TUS, already for consistency with CLDR.
Contributions are hopefully welcome.
More information about the Unicode