Preformatted superscript in ordinary text, paleography and phonetics using Latin script (was: Re: A sign/abbreviation for "magister" - third question summary)

Marcel Schneider via Unicode unicode at unicode.org
Wed Nov 7 13:49:38 CST 2018


On 06/11/2018 12:04, Janusz S. Bień via Unicode wrote:
> 
> On Sat, Oct 27 2018 at 14:10 +0200, Janusz S. Bień via Unicode wrote:
>> Hi!
>>
>> On the over 100 years old postcard
>>
>> https://photos.app.goo.gl/GbwNwYbEQMjZaFgE6
>>
>> you can see 2 occurences of a symbol which is explicitely explained (in
>> Polish) as meaning "Magister".
>>
> 
> [...]
> 
>> The third and the last question is: how to encode this symbol in
>> Unicode?
> 
> 
> A constructive answer to my question was provided quickly by James Kass:
> 
> On Sat, Oct 27 2018 at 19:52 GMT, James Kass via Unicode wrote:
>> Mr͇ / M=ͬ
> 
> I answered:
> 
> On Sun, Oct 28 2018 at 18:28 +0100, Janusz S. Bień via Unicode wrote:
> 
> [...]
> 
>> For me only the latter seems acceptable. Using COMBINING LATIN SMALL
>> LETTER R is a natural idea, but I feel uneasy using just EQUALS SIGN as
>> the base character. However in the lack of a better solution I can live
>> with it :-)
>>
>> An alternative would be to use SMALL EQUALS SIGN, but looks like fonts
>> supporting it are rather rare.
> 
> and Philippe Verdy commented:
> 
> On Sun, Oct 28 2018 at 18:54 +0100, Philippe Verdy via Unicode wrote:
> 
> [...]
> 
>>
>> There's a third alternative, that uses the superscript letter r,
>> followed by the combining double underline, instead of the normal
>> letter r followed by the same combining double underline.
> 
> Some comments were made also by Michael Everson:
> 
> On Sun, Oct 28 2018 at 20:42 GMT, Michael Everson via Unicode wrote:
> 
> [...]
> 
>> I would encode this as Mʳ if you wanted to make sure your data
>> contained the abbreviation mark. It would not make sense to encode it
>> as M=ͬ or anything else like that, because the “r” is not modifying a
>> dot or a squiggle or an equals sign.  The dot or squiggle or equals
>> sign has no meaning at all. And I would not encode it as Mr͇, firstly
>> because it would never render properly and you might as well encode it
>> as Mr. or M:r, and second because in the IPA at least that character
>> indicates an alveolar realization in disordered speech. (Of course it
>> could be used for anything.)
> 
> FYI, I decided to use the encoding proposed by Philippe Verdy (if I
> understand him correctly):
> 
> Mʳ̳
> 
> i.e.
> 
> 'LATIN CAPITAL LETTER M' (U+004D)
> 'MODIFIER LETTER SMALL R' (U+02B3)
> 'COMBINING DOUBLE LOW LINE' (U+0333)
> 
> for purely pragmatic reasons: it is rendered quite well in my
> Emacs. According to the 'fc-search-codepoint" script, the sequence is
> supported on my computer by almost 150 fonts, so I hope to find in due
> time a way to render it correctly also in XeTeX. I'm also going to add
> it to my private named sequences list
> (https://bitbucket.org/jsbien/unicode4polish).
> 
> The same post contained a statement which I don't accept:
> 
> On Sun, Oct 28 2018 at 20:42 GMT, Michael Everson via Unicode wrote:
> 
> [...]
> 
>> The squiggle in your sample, Janusz, does not indicate anything; it is
>> only a decoration, and the abbreviation is the same without it.
> 
> One of the reasons I disagree was described by me in the separate thread
> "use vs mention":
> 
> https://unicode.org/mail-arch/unicode-ml/y2018-m10/0133.html
> 
> There were also some other statements which I find unacceptable:
> 
> On Mon, Oct 29 2018 at 12:20 -0700, Doug Ewell via Unicode wrote:
> 
> [...]
> 
>> The abbreviation in the postcard, rendered in plain text, is "Mr".
> 
> He was supported by Julian Bradfield in his mail on Wed, Oct 31 2018 at
> 9:38 GMT (and earlier in a private mail).
> 
> I understand that both of them by "plane text" mean Unicode.
> 
> 
> On 10/31/2018 2:38 AM, Julian Bradfield via Unicode wrote:
> 
>>   You could use the various hacks you've discussed, with modifier
>> letters; but that is not "encoding", that is "abusing Unicode to do
>> markup". At least, that's the view I take!
> 
> and was supported by Asmus Freytag on Wed, Oct 31 2018 at  3:12
> -0700.
> 
> The latter elaborated his view later and I answered:
> 
> On Fri, Nov 02 2018 at 17:20 +0100, Janusz S. Bień via Unicode wrote:
>> On Fri, Nov 02 2018 at  5:09 -0700, Asmus Freytag via Unicode wrote:
> 
> [...]
> 
>>> All else is just applying visual hacks
>>
>> I don't mind hacks if they are useful and serve the intended purpose,
>> even if they are visual :-)
> 
> [...]
> 
>>> at the possible cost of obscuring the contents.
>>
>> It's for the users of the transcription to decide what is obscuring the
>> text and what, to the contrary, makes the transcription more readable
>> and useful.
> 
> Please note that it's me who makes the transcription, it's me who has a
> vision of the future use and users, and in consequence it's me who makes
> the decision which aspects of text to encode. Accusing me of "abusing
> Unicode" will not stop me from doing it my way.
> 
> I hope that at least James Kass understands my attitude:
> 
> On Mon, Oct 29 2018 at  7:57 GMT, James Kass via Unicode wrote:
> 
> [...]
> 
>> If I were entering plain text data from an old post card, I'd try to
>> keep the data as close to the source as possible. Because that would
>> be my purpose. Others might have different purposes.
> 
> There were presented also some ideas which I would call "futuristic":
> introducing a new combining character and using variations sequences.
> This ideas should be discussed in separate threads, which seems to
> happen now.

Thank you for debriefing. So far I’m pleased to infer that the outlined
outcome encounters general agreement.

It’s probably safe to conjecture that the case of the Polish abbreviation
for "magister" is becoming a textbook example of the reception of the
discussed Unicode policy with respect to superscript.

Best regards,

Marcel


More information about the Unicode mailing list