Encoding <combining abbreviation mark> (was: Re: A sign/abbreviation for "magister")

Sun Nov 4 11:42:22 CST 2018

Sorry, I didn’t truncate the subject line, it was my mail client.

On 04/11/2018 17:45, Philippe Verdy wrote:
> 
> Note that I actually propose not just one rendering for the
> <combining abbrevaition mark> but two possible variants (that would
> be equally valid withou preference). Use it after any base cluster
> (including with diacritics if needed, like combining underlines).
> 
> - the first one can be to render the previous cluster as superscript
> (very easy to do implement synthetically by any text renderer)
> 
> - the second one can be to render it as an abbreviation dot (also
> very easy to)
> 
> Fonts can provide their own mapping (e.g. to offer alternate glyph
> forms or kerning for the superscript, they can also reuse the leter
> forms used for other existing and encoded superscript letters, or to
> position the abbreviation dot with negative kerning, for example
> after a T), in which case the renderer does not have to synthetize
> the rendering for the sequence combining sequence not mapped in the
> font.
> 
> Allowing this variation from the start will:
> 
> - allow renderers to support it fast (so a rapid adoption for
> encoding texts in humane languages, instead of the few legacy
> superscript letters).
> 
> - allow font designers to develop and provide reasonnable mappings if
> needed (to adjust the position or size of the superscript) in updated
> fonts (no requirement for them to add new glyphs if it's just to map
> the same glyphs used by existing superscript letters)
> 
> - also prohibit the abuse of this mark for every text that one would
> would to write in superscript (these cases can still uses the few
> existing superscript letters/digits/signs that are already encoded),
> so this is not suitable for example for marking mathematical
> exponents (e.g. "x²", if it's encoded as <x,2,combining abbreviation
> mark> could validly be rendered as "x2."): exponents must use the
> superscript (either the already encoded ones, or using external
> styles like in HTML/CSS, or in LaTeX which uses the notation "x^2",
> both as a style, but also some intended semantic of an exponent and
> certainly not the intended semantic of an abbreviation)

Unicode always (or in principle) aims at polyvalence, making characters
reusable and repurposable, while the combining abbreviation mark does
not solve the problems around making chemicals better represented in
plain text as seen in the parent thread, for example. I don’t advocate
this use case, as I’m only lobbying for natural languages’ support as
specified in the Standard,* but it shouldn’t be forgotten given there is
some point in not disfavoring chemistry compared to mathematics, that is
already widely favored over chemistry when looking at the symbol blocks,
while chemistry is denied three characters because they are subscript
forms of already encoded letters.

Beyond that, the problem with *COMBINING ABBREVIATION MARK is that it
needs OpenType support to work, while direct encoding of preformatted
superscripts and use as abbreviation indicators for an interoperable
digital representation of natural languages does not.

Best regards,

Marcel
* As already repeatedly stated, I’m taking the one bit where TUS states
that all natural languages shall be given a semantically unambiguous (ie
not introducing new ambiguity) and interoperable digital representation.