Encoding <combining abbreviation mark>

Philippe Verdy via Unicode unicode at unicode.org
Sun Nov 4 13:51:33 CST 2018

Note also that some other scripts have their own dedicated "abbreviation
mark" encoded, but as distinctive punctuations or modifier letters: they
are NOT combining. I do not advocate changing these scripts at all.

As well I don't propose to instruct authors to use an <Asian abbreviation
mark> after Latin/Greek/Letters/Arabic/Hebrew letters used in
abbreviations. This would be non-sense, including visually, even if you can
infer some semantics, as meaning this is effectively an abbreviation for
text processing (this is still non-senses because this breaks existing
segregations of scripts, delimitation of clusters, line breaking
opportunities, and so on; and this approach would break because these
<Asian abbreviation mark> can legally occur in isolation, without being
necessarily attached to the previous cluster to modify it: the previous
cluster, before the <Asian abbreviation mark> could be for example a
whitespace, or a quotation mark)

I don't propose the <combining abbreviation mark> as being suitable for
mathematics exponents and Chemical notations (they still need something
else to allow their superscript and subscripts to stack below each other,
and the variation of <combining abbreviation mark> explicitly permitting it
to be rendered as a dot or another suitable mark, depending on the base
character of the combining sequence, is NOT suitable for these mathematics
or chemical notations).

Once again you need something else for these technical notations, but NOT
the proposed <combining abbreviation mark>, and NOT EVEN the existing
"modifier letters" <superscript letter X>, which were in fact first
introduced only for IPA lowercase symbols, with some of them being then
turned as "plain lowercase letters" in alphabets of some natural languages
that have been recently romanized by borrowing IPA symbols (notably in
Africa, where the initial letters borrowed from IPA, or some new specific
letter variants with additional hooks, opening or strokes, were then
followed by the addition of separate capital letters: these letters are NOT
conveying any semantic of an abbreviation, and this is also NOT the case
for their usage as IPA symbols).

There's NO interoperability at all when taking **abusively** the existing
"modifier letters" <superscript letter X> or <superscript digit> for use in
abbreviations (or even in technical notations in maths or chemical
formulas, where they DON'T work the way they should when used with
subscripts, and cannot represent multiple layers of subscripts, e.g. for
expressions like "2^2^2" in LaTeX for maths). Keep these "modifier letters"
or <superscript digit> or <superscript punctuation> for use as plain
letters or plain digits or plain punctuation or plain symbols (including
IPA) in natural languages. Anything else is abusive ans hould be considered
only as "legacy" encoding, not recommended at all in natural languages.

Le dim. 4 nov. 2018 à 20:19, Philippe Verdy <verdy_p at wanadoo.fr> a écrit :

> Le dim. 4 nov. 2018 à 18:34, Marcel Schneider <charupdate at orange.fr> a
> écrit :
>> On 04/11/2018 17:45, Philippe Verdy wrote:
>> Marcel
>> * As already repeatedly stated, I’m taking the one bit where TUS states
>> that all natural languages shall be given a semantically unambiguous (ie
>> not introducing new ambiguity) and interoperable digital representation.
> I also support the sermantically unambiguous digital representation of all
> natural languages.
> Interoperability is always limited, even for existing script (including
> Latin), that's why text renderers (and fonts) constantly need new
> developments (but that does not need that these developments will be
> deployed).
> That's why we have to document reasonnable fallbacks for rendering on
> limited platforms, each time this is possible (and in this case this is
> clearly possible with extremely low efforts).
> Even the mere fallback to render the <combining abbreviation mark> as a
> dotted circle (total absence of support) will not block completely reading
> the abbreviation:
> * you'll see "2e◌" (which is still better than only "2e", with minimal
> impact) instead of
> * "2◌" (which is worse ! this is still what already happens when you use
> the legacy encoded <superscript e> which is also semantically ambiguous for
> text processing), or
> * "2e." (which is acceptable for rendering but ambiguous semantically for
> text processing)
> So compare things faily: the solution I propose is EVEN MOREINTEROPERABLE
> than using <superscript Latin  letters> (which is also impossible for
> noting all abbrevations as it is limited to just a few letters, and most of
> the time limited to only the few lowercase IPA symbols). It puts an end to
> the pressure to encode superscript letters.
> If you want to support other notations (e.g. in chemical or
> mathematics notations, where both superscript and subscript must be present
> and stack together, and where the allowed varaition using a dot or similar)
> you need another encoding and the existing legacy <superscript Latin
> letters> are not suitable as well.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20181104/8f17f335/attachment.html>

More information about the Unicode mailing list