A sign/abbreviation for "magister"
Marcel Schneider via Unicode
unicode at unicode.org
Thu Nov 1 15:42:02 CDT 2018
On 01/11/2018 01:21, Asmus Freytag via Unicode wrote:
> On 10/31/2018 3:37 PM, Marcel Schneider via Unicode wrote:
>> On 31/10/2018 19:42, Asmus Freytag via Unicode wrote:
>>> It is a fallacy that all text output on a computer should match the convention
>>> of "fine typography".
>>> Much that is written on computers represents an (unedited) first draft. Giving
>>> such texts the appearance of texts, which in the day of hot metal typography,
>>> was reserved for texts that were fully edited and in many cases intended for
>>> posterity is doing a disservice to the reader.
>> The disconnect is in many people believing the user should be <del>disabled to write</del>
>> [prevented from writing]
Thank you for correcting.
>> his or her language without disfiguring it by lack of decent keyboarding, and
>> that such input should be considered standard for user input. Making such text
>> usable for publishing needs extra work, that today many users cannot afford,
>> while the mass of publishing has increased exponentially over the past decades.
>> The result is garbage, following the rule of “garbage in, garbage out.”
> No argument that there are some things that users cannot key in easily and that the common
> fallbacks from the days of typewritten drafts are not really appropriate in many texts that
> otherwise fall short of being "fine typography".
The goal I wanted to reach by discussing and invalidating the biased and misused concept
of “fine typography” is that this thread could get rid of it, but I’m definitely unfortunate.
It’s hard for you to understand that relegating abbreviation indicators into the realm of
“fine typography” recalls me what I got to hear (undisclosed for privacy) when asking that
the French standard keyboard layouts (plural) support punctuation spacing with
NARROW NO-BREAK SPACE, and that is closely related to the issue about social media that
you pointed below.
Don’t worry about users not being able to “key in easily” what is needed for the digital
representation of their language, as long as:
1. Unicode has encoded what is needed;
2. Unicode does not prohibit the use of the needed characters.
The rest is up to keylayout designers. Keying in anything else is not an issue so far.
>> The real
>> disservice to the reader is not to enable the inputting user to write his or her
>> language correctly. A draft whose backbone is a string usable as-is for publishing
>> is not a disservice, but a service to the reader, paying the reader due respect.
>> Such a draft is also a service to the user, enabling him or her to streamline the
>> workflow. Such streamlining brings monetary and reputational benefit to the user.
> I see a huge disconnect between "writing correctly" and "usable as-is for publishing". These
> two things are not at all the same.
> Publishing involves making many choices that simply aren't necessary for more "rough & ready"
> types of texts. Not every twitter or e-mail message needs to be "usable as-is for publishing", but
> should allow "correctly written" text as far as possible.
Not every message, especially not those whose readers expect a quick response.
The reverse is true with new messages (tweets, thread lauchers, requests, invitations).
As already discussed, there are several levels of correctness. We’re talking only about
the accurate digital representation of human languages, which includes correct punctuation.
E.g. in languages using letter apostrophe, hashtags made of a word including an apostrophe
are broken when ASCII or punctuation apostrophe (close quote) is used, as we’ve been told.
Supposedly part of this discussion would be streamlined if one could experience how easy
it can be to type in one’s language’s accurate digital representation. But it’s better
to be told what goes on, and what “strawmen” we’re confused with, since, again,
informed discussion brings advancement.
> When "desktop publishing" as it was called then, became available, too many people started to
> obsess with form over content. You would get these beautifully laid out documents, the contents
> of which barely warranted calling them a first draft.
Typing in one’s language’s accurate digital representation is not being obsessed with form
over content, provided that appropriate keyboarding is available. E.g. the punctuation
apostrophe is on level 1 where the ASCII apostrophe is when digits are locked on level 1
on the French keyboard I’ve in use; else, digits are on level 3 where is also superscript e
for ready input of most of the ordinals (except 1ᵉʳ/1ʳᵉ, 2ⁿᵈ for ranges, and plural with ˢ):
2ᵉ 3ᵉ 4ᵉ 5ᵉ 6ᵉ 7ᵉ 8ᵉ 9ᵉ 10ᵉ 11ᵉ 12ᵉ. Hopefully that demo makes clear what is intended.
Users not needing accurate repsesentation in a given string are free to type in otherwise.
The goal of this discussion is that Unicode allow accurate representation, not impose it.
Actually Unicode is still imposing inaccurate representation to some languages due to TUS
prohibiting the use of precomposed superscript letters in text representing human languages
with standard orthography, which is what “ordinary text” seems to boil down to.
>> That disconnect seems to originate from the time where the computer became a tool
>> empowering the user to write in all of the world’s languages thanks to Unicode.
> No, this has nothing to do with Unicode / multi-script support.
Why not? Accurate interoperable digital representation of French was totally impossible
before version 3.0 of Unicode (bringing the *new* NARROW NO-BREAK SPACE), while before,
the Standard was prevented to have such a character by misdefining the line-break
property of U+2008 PUCTUATION SPACE, that has the right width and serves no purpose
only because unlike related U+2007 FIGURE SPACE (but not U+2012 FIGURE DASH, mistakenly
added to the list in my previous e-mail), it is not non-breakable. Useful punctuation
spacing was dismissed as being too “fine” a typography for being universally available
and interoperable, while the opposite is true: It’s the only way of writing French
without being at risk of conveying the impression of poor craftmanship (see below).
>> The concept of “fine typography” was then used to draw a borderline between what
>> the user is supposed to input, and what he or she needs to get for publication.
> This same dividing line applies in English (or any of the other individual languages).
Yes of course. The four lines above only intended to set the scene. AFAICS, the
disconnect of an encoding standard designed for accuracy and interoperability, the use and
the usefulness of which is intentionally throttled down in order to get non-accurate and
non-interoperable digital representations of some languages, is unprecedented, and it
originates from the time the Unicode Standard was set up. Spacing has been fixed,
ordinal indicators are being fixed, and now, other abbreviation indicators still need
>> In the same move, that concept was extended in a way that it should include the
>> quality of the string, additionally to what _fine typography_ really is: fine
>> tuning of the page layout, such as vertical justification, slight variations in
>> the width of non-breakable spaces, and of course, discretionary ligatures.
> Certain elements of styling are also part of fine typography. In some cases, readying a "string"
> for publication also means applying spelling conventions or grammatical conventions (for those
> cases where there are ambiguities in the common language, or applying preferred word choices
> or ways of formulating things that may be particular to individual publishers or types of publications.
None of these is a reason not to be able to input abbreviation indicators in plain text.
But for the rest, I cannot see that applying style guides’ orthographies is part of fine
typography, just of publishing. These parameters are at the discretion of the management.
That does not preclude the input of superscript on a keyboard, and as a side note, the
intake of publishers is mainly at least rich text or another markup convention, most
currently TeX (for scientific publications). But Unicode promises accurate interoperable
representation of all of the world’s languages in plain text. Hence, authors are advised
that a good way to make TeX more human-readable is to use more Unicode.
> Using HYPHEN-MINUS instead of "EN DASH" or "HYPHEN" is perfectly OK for early stages of
> drafting a text. Attempting to follow those and similar conventions during that phase forces
> the author to pay attention to the wrong thing - his or her focus should be on the ideas and
> the content, not the form of the document.
There is some good point in that. But a close look at just these two conventions leads
to significantly lessen the advantage of not using accurate punctuation in one’s drafts.
1. HYPHEN-MINUS vs EN DASH or, should be added, EM DASH: That is not possible in locales
using no spacing around EM DASH. Right, SPACE, HYPHEN-MINUS, SPACE is easily replaced with
SPACE, EN DASH, SPACE or any other dashing convention at a later stage. But not using
a correct dash out of U+2013, U+2014 and U+2015 is not nearly useful if all these are
on level 2 of three digit keys (1, 2, 3 or another range). Additionally that brings the
advantage of being able to differenciate while thinking at the content. Nobody else can
do that job later with a comparable efficiency.
2. HYPHEN-MINUS vs HYPHEN: That has much of a non-starter. As already discussed in
detail on this List, HYPHEN is a useless duplicate encoding of HYPHEN-MINUS, which
in almost all fonts has the glyph of HYPHEN and is used for the system hyphen from
the automated hyphenation when a .docx is exported as a .pdf file. Using fonts
designed otherwise requires either a special keyboard layout or weird replacements
because the HYPHEN-MINUS in URLs and e-mail addresses must not be replaced. So using
HYPHEN-MINUS everywhere a HYPHEN is intended is OK even in publishing. Only some
fonts may need fixing (I don’t know more than a single one).
>> Producing a plain text string usable for publishing was then put out of reach
>> of most common mortals, by using the lever of deficient keyboarding, but also
>> supposedly by an “encoding error” (scare quotes) in the line break property of
>> U+2008 PUNCTUATION SPACE, that should be non-breakable like its siblings
>> U+2007 FIGURE SPACE (still—as per UAX #14—recommended for use in numbers) <del>and
>> U+2012 FIGURE DASH</del> to gain the narrow non-breaking space needed to space the
[corrected, see above]
>> triads in numbers using space as a group separator, and to space big punctuation
>> in a Latin script using locale, where JTC1/SC2/WG2 had some meetings for the UCS:
> Those details should be handled in a post-processing phase for documents that are intended
> for publication.
Not at all, as already stated above. Making a mess of any text file that is not print-ready,
is an insult to the reader. And any *French* text not spacing punctuations with NNBSP is at
risk of ending up as a mess.
> One of the big problem in current architectures is that things like "autocorrect"
> which attempt to overcome the limitations of the current keyboards,
That is another disconnect, already pointed out repeatedly. Current keyboards have no
intrinsic “limitations”, and referring to outdated keyboard layouts as a fatality is
in disconnect with the reality, since all OS vendors offer facilities to complete,
enhance or change the keyboard layout.
> are applied at input time
> only; and authors need to constantly interact with these helpers to make sure they don't mis-
Correct; that is also where originated what was called “the apostrophe catastrophe.”
> Much text that is laboriously prepared this way, will not survive future revisions during
> the editing process needed to get the *content* to publication quality.
That only applies to files fed in an editing process. Many people are directly publishing
out-of-the-keyboard, and that is where complete and readily available Unicode support matters
most. Anything else can be made up by the rendering engine, as you already noted. The force
of Unicode being interoperability and data exchange, I can see no technical reason not to
type in Unicode on one’s keyboard, including abbrevation indicators of any kind.
> All because users have no convenient tool to "touch-up" these dashes, quotes, and spaces
> in a later phase; at the same time they apply copy-editing, for example.
Because once you are in a WYSIWYG environment, you cannot simply transfer the text to
your text editor to apply regexes, and people need to write macros in VBA to get things
done I figure out. Autocorrect is consistent with WYSIWYG. People not interested in seeing
what they’re typing may wish to use LaTeX, where they can see it in another window.
What I cannot see is why these important issues should preclude users from typing
preformatted superscripts on their keyboard, be it via a ‹superscript› dead key.
Such a dead key is already standardized, but again, Karl Pentzlin’s proposal to
encode the missing characters has been rejected, while in this thread we could
see there is an interest for what could be called a UnicodeChem notation, a
nearly plain text encoding of chemical elements, compounds and processes.
>> For everybody having beneath his or her hands a keyboard whose layout driver is
>> programmed in a fully usable way, the disconnect implodes. At encoding and input
>> levels (the only ones that are really on-topic in this thread) the sorcery called
>> fine typography sums then up to nothing else than having the keyboard inserting
>> fully diacriticized letters, right punctuation, accurate space characters, and
>> superscript letters as ordinal indicators and abbreviation endings, depending
>> on the requirements.
> In the days of typewritten manuscripts you had to follow certain conventions that allowed the
> typesetter to select the intended symbols and styled letters. I'm not arguing that we should
> return to where such fallbacks are used. And certainly not arguing that we should be using
> ASCII fallbacks for letters with diacritics, such as "oe" for "ö".
> But many issues around selecting the precise type of space or dash are not so much issues
> of correct content but precisely issues of typography.
That is right so far as the French national printing office recommends to use NBSP with the
colon, while the industry widely uses NNBSP for colon, too, Philippe Verdy reported on this
List. It also states that the same should be done for angle quotation marks, but does not so.
Here is indeed matter for fine-tuning, but as stated above and below, NBSP does not work in
every environment, even not in most of the most common ones where users are typing text. I
still call a string publication ready where big punctuations are spaced with NNBSP uniformely.
> Some occupy an intermediate level, where it would be quite appropriate to apply them to
> many automatically generated texts. (I am aware of your efforts in CLDR to that effect).
Thank you for the occasion to invite everyone to join in and contribute to the oncoming
surveys of Unicode’s Common Locale Data Repository. Much needs to be done in French and
in many locales already present, even if the stress should naturally be on adding *new*
locales still not in CLDR.
> But I still believe that they have no place in content focused writing.
That is only the effect of an error of perception, that is widely fueled by the deficient
keyboard design not supporting automated punctuation spacing for French. See ticket in Trac.
>> Now was I talking about “all text output on a computer”? No, I wasn’t.
>> The computer is able to accept input of publishing-ready strings, since we have
>> Unicode. Precluding the user from using the needed characters by setting up
>> caveats and prohibitions in the Unicode Standard seems to me nothing else than
>> an outdated operating mode. U+202F NARROW NO-BREAK SPACE, encoded in 1999 for
>> Mongolian , has been readily ripped off by the French graphic industry.
>> In 2014, TUS started mentioning its use in French ; in 2018, it put it on
>> top .
>> That seems to me a striking example of how things encoded for other purposes
>> are reused (or following a certain usage, “abused”, “hacked”, “hijacked”) in
>> locales like French. If it wasn’t an insult to minority languages, that
>> language could be called, too, “digitally disfavored” in a certain sense.
>>> On the other hand, I'm a firm believer in applying certain styling attributes
>>> to things like e-mail or discussion papers. Well-placed emphasis can make such
>>> texts more readable (without requiring that they pay attention to all other
>>> facets of "fine typography".)
>> The parenthesized sidenote (that is probably the intended main content…) makes
>> this paragraph wrong. I’d buy it if either the parenthesis is removed or if it
>> comes after the following.
> Now you are copy-editing my e-mails. :)
> I don't read or write French on the level that I can evaluate your contention that the language
> is digitally disadvantaged.
It was heavily disadvantaged until U+202F NARROW NO-BREAK SPACE was encoded and widely
implemented. Implementation would have been speedy and straightforward if only it had
been present from the beginning on, as U+2008 PUNCTUATION SPACE. Even the character name
would have matched the purpose. Perhaps the Frenchmen implied were hindered in fixing
that bug while being aware of its gravity.
Then it was still disadvantaged by lack of ordinal indicators, but that is now fixed
thanks to CLDR Technical Committee, past summer. Many thanks.
Ultimately it is part of the languages using superscript as the abbreviation indicator,
and not allowed by Unicode to use even the already encoded superscript letters. That was
not fixed in CLDR for v34 because the browsers used to display the data, notably in the
SurveyTool implemented as a web interface, still are not using decent fonts having
Unicode conformant glyphs for all superscript letters and even digits as seen in some
webmail interfaces. The resulting ransome note effect made it impossible to responsively
back the use of those letters in natural languages as abbreviation indicators, because
unlike phonetics using these letters in isolation, natural languages may have abbreviation
endings encompassing more than the final letter.
For the abbreviation of Magister like on the Polish postcard, that is not a problem.
> To some extent, software will always reflect the biases of its creators, and in some subtle ways
> these will end up in conflict with conventions in other languages. In some cases, conventions
> applied by human typesetters cannot easily be duplicated by software that cannot recognize
> the meaning of the text,
Very good point. That is exactly the reason why the author should be enabled to take full control
over his or her text, and that is best and most universally done by correctly programming the
layout driver of the keyboard used.
> and in some cases we have seen languages abandoning these
> conventions in recent reforms in favor of a set of rules that are a bit more "mechanistic"
> if you will.
> In German, it used to be necessary to understand the word division to know whether or not
> to apply a ligature. Some of the rules for combining words into compounds were changed
> and that may have made that process more regular as well.
That is a fine step forward for good typography.
> But still, forcing all users to become typesetters was one of the wrong turns taken during the
> early development of publishing on computers.
I don’t think so at all. Users were not “forced” to do anything. If the autocorrect facilities
helping over the deficient keyboarding were not welcome, they could easily be turned off. And
professional typesetters always remained active, turning to the computer in the wake.
I’ve experienced myself being able thanks to Microsoft’s word processor to do professionally
looking typesetting. (As I was responsible for the content anyway, it didn’t make a difference.)
But first I had to add some entries to Word’s autocorrect for tweaking the keyboard.
> You seem to revel in knowing all the little
> details in French usage,
Not at all. That knowledge is a sheer necessity, and fortunately it is so narrow that
you don’t need to know that much to digitally typeset French. But you need to know the
relevant points. The fact that NARROW NO-BREAK SPACE is narrow doesn’t make it little,
but it misleads people to classify it under “fine typography”, even more in French where
(as found in TUS, in French in the text) it’s called an “espace fine insécable”.
> but I bet not even all educated French people reach your level.
Precisely on this point, perhaps not but that point is relevant mainly to those
programming and documenting keyboard layouts. After that, punctuation spacing is
automated on level 2 (just press Shift) and easily turned off by several means.
I hope that will be welcome, as almost everyone in France is very careful to
always space the big punctuation marks by the means available so far.
And to always superscript the ordinal indicators and other abbreviation indicators,
at least while handwriting.
> The best keyboard drivers won't help.
Why do you see that they won’t help?
> So the idea that every string is supposed to be
> "publication-ready" remains a fallacy. However, there shouldn't be encoding obstacles
> to creating publication-ready strings. (Whether created by copy-editors, typesetters, or
> advanced tools that post-process draft texts).
What I’d mainly like to see is that Unicode (supposing that you are writing on behalf
of the Consortium) do not impose a division of the workflow. Everybody should be able
to apply to any task the most appropriate process, no matter of how many parts it will
consist. If a subset of end-users wish to input strings that won’t need to be modified
in detail for publishing (except headings), Unicode is here to empower them to do so.
Can that be taken for granted?
> If an Twitter message uses spaces around punctuation that are not the right width, who
As pointed out in the paragraph of my previous e-mail just below, the main issue
around punctuation spacing in French in non-justifying layout is not the width of
the space characters, but their line-breaking property. Believe it or not, U+00A0
NO-BREAK SPACE is breakable in those environments, that are therefore messing around
with spaced punctuation unless the space used is U+202F NARROW NO-BREAK SPACE. Or
U+2007 FIGURE SPACE, but if we’re having to use an extra space character, we may as well
pick the right one, given FIGURE SPACE is not fit for publishing, while NNBSP is.
> but if your copy-editor can't prepare a manuscript for publication because of software
> limitations, that's a different can of worms.
My copy-editor is me. I wrote in my previous (perhaps too long, but couldn’t help) e-mail:
“Making such text usable for publishing needs extra work, that today many users cannot
afford”, and: “Such a draft is also a service to the user, enabling him or her to
streamline the workflow. Such streamlining brings monetary and reputational benefit to
the user.” The working scheme used with TeX or regexes is not interoperable, and the
drafts are not all-purpose. A publishing-ready draft is in my opinion a plain text string
that can be copy-pasted as-is — or typed directly — in a blog post composer form while
being sure that all punctuation and punctuation spacing is fully operational. I don’t
currently do this, but many people do, and are doing word processing where the same
applies, given the autocorrect doesn’t use the up-to-date space and can hardly guess in
every case what the user intends to type, you pointed out.
>> With due respect, I need to add that the disconnect in that is visible only to
>> French readers. Without NNBSP, punctuation à la française in e-mails is messed
>> up because even NBSP is ignored (I don’t know what exactly happens at backend;
>> anyway at frontend it’s like a normal space in at least one e-mail client and
>> in several if not all browsers, and if pasted in plain text from MS Word, it’s
>> truly replaced with SP. All that makes e-mails harder to read. Correct spacing
>> with punctuation in French is often considered “fine-tuning”, but only if that
>> punctuation spacing is not supported by the keyboard driver, and that’s still
>> almost always the case, except on the updated version 1.1 of the bépo layout
>> (and some personal prototypes not yet released).
More information about the Unicode