A sign/abbreviation for "magister"
Philippe Verdy via Unicode
unicode at unicode.org
Sat Nov 3 15:02:23 CDT 2018
As well the separate encoding of mathematical variants could have been
completely avoided (we know that this encoding is not sufficient, so much
that even LaTeX renderers simply don't need it or use it !).
We could have just encoded a single <combining mathematical symbol> to use
after any base cluster, and the whole set was covered !
The additional distinction of visual variants (monospace, bold, italic...)
would have been encoded using variation selectors after the <combining
mathematical symbol>: the semantic as a mathematical symbols was still
preserved including the additional semantic for distinguishing some symbols
in maths notations like "f(f)=f" where the 3 "f" must be distinguished
(between the function in a set of functions, the source belonging to one
set of values or being a variable, and the result in another set which may
be a value or variable.
Once again this covered all the needs without using this duplicate encoding
(that was NEVER needed for roundtrip compatibility with legacy non-UCS
charsets).
All I ask is reasonnable: it's just a SINGLE code point to encode the
combining mark itself, semantically, NOT visually.
The visual appearance can be controlled by an additional variation selector
to cancel the effect of glyph variations allowed for ALL characters in the
UCS, where there's just a **non-mandatory** form generally used by default
in fonts and matching more or less the "representative glyph" shown in the
Unicode and ISO 10646 charts, which cannot show all allowed variations (if
there's a need to detail them, Unicode offers the possibility to ask to
register known "variation sequences" which can feed a supplementary chart
showing more representative glyphs, one for each accepted "variation
sequence", but without even needing to modify the "representative glyph"
shown in the base chart.
Note that even if Unicode requires registration of variation sequences
prior to using them, the published charts still omit to add the additional
charts (just below the existing base chart) showing representative glyphs
for accepted sequences, with one small chart per base character, listing
them simply ordered by "VSn" value. All what Unicode publishes is only a
mere data list with some names (not enough for most users to be ware that
variations can be encoded explicitly and compliantly)
Le sam. 3 nov. 2018 à 20:41, Philippe Verdy <verdy_p at wanadoo.fr> a écrit :
>
>
> Le ven. 2 nov. 2018 à 20:01, Marcel Schneider via Unicode <
> unicode at unicode.org> a écrit :
>
>> On 02/11/2018 17:45, Philippe Verdy via Unicode wrote:
>> [quoted mail]
>> >
>> > Using variation selectors is only appropriate for these existing
>> > (preencoded) superscript letters ª and º so that they display the
>> > appropriate (underlined or not underlined) glyph.
>>
>> And it is for forcing the display of DIGIT ZERO with a short stroke:
>> 0030 FE00; short diagonal stroke form; # DIGIT ZERO
>> https://unicode.org/Public/UCD/latest/ucd/StandardizedVariants.txt
>>
>> From that it becomes unclear why that isn’t applied to 4, 7, z and Z
>> mentioned in this thread, to be displayed open or with a short bar.
>>
>> > It is not a solution for creating superscripts on any letters and
>> > mark that it should be rendered as superscript (notably, the base
>> > letter to transform into superscript may also have its own combining
>> > diacritics, that must be encoded explicitly, and if you use the
>> > varaition selector, it should allow variation on the presence or
>> > absence of the underline (which must then be encoded explicitly as a
>> > combining character.
>>
>> I totally agree that abbreviation indicating superscript should not be
>> encoded using variation selectors, as already stated I don’t prefer it.
>> >
>> > So finally what we get with variation selectors is: <baseline letter,
>> > variation selector, combining diacritic> and <baselineletter
>> > precombined with the diacritic, variation selector> which is NOT
>> > canonically equivalent.
>>
>> That seems to me like a flaw in canonical equivalence. Variations must
>> be canonically equivalent, and the variation selector position should
>> be handled or parsed accordingly. Personally I’m unaware of this rule.
>> >
>> > Using a combining character avoids this caveat: <baseline letter,
>> > combining diacritic, combining abbreviation mark> and <baselineletter
>> > precombined with the diacritic, combining abbreviation mark> which
>> > ARE canonically equivalent. And this explicitly states the semantic
>> > (something that is lost if we are forced to use presentational
>> > superscripts in a higher level protocol like HTML/CSS for rich text
>> > format, and one just extracts the plain text; using collation will
>> > not help at all, except if collators are built with preprocessing
>> > that will first infer the presence of a <combining abbreviation mark>
>> > to insert after each combining sequence of the plain-text enclosed in
>> > a italic style).
>>
>> That exactly outlines my concern with calls for relegating superscript
>> as an abbreviation indicator to higher level protocols like HTML/CSS.
>>
>
> That's exactlky my concern that this relation to HTML/CSS should NOT occur
> at all ! It's really not the solution, HTML/CSS styles have NO semantic at
> all (I demonstrated it in the message you are quoting).
>
>
>> > There's little risk: if the <combining abbreviation mark> is not
>> > mapped in fonts (or not recognized by text renderers to create
>> > synthetic superscript scripts from existing recognized clusters), it
>> > will render as a visible .notdef (tofu). But normally text renderers
>> > recognize the basic properties of characters in the UCD and can see
>> > that <combining abbreviation mark> has a combining mark general
>> > property (it also knows that it has a 0 combinjing class, so
>> > canonical equivalences are not broken) to render a better symbols
>> > than the .notdef "tofu": it should better render a dotted circle.
>> > Even if this tofu or dotted circle is rendered, it still explicitly
>> > marks the presence of the abbreviation mark, so there's less
>> > confusion about what is preceding it (the combining sequence that was
>> > supposed to be superscripted).
>>
>> The problem with the <combining abbreviation mark> you are proposing
>> is that it contradicts streamlined implementation as well as easy
>> input of current abbreviations like ordinal indicators in French and,
>> optionally, in English. Preformatted superscripts are already widely
>> implemented, and coding of "4ᵉ" only needs two characters, input
>> using only three fingers in two times (thumb on AltGr, press key
>> E04 then E12) with an appropriately programmed layout driver. I’m
>> afraid that the solution with <combining abbreviation mark> would be
>> much less straightforward.
>>
>
> This is not a real concern: this is legacy old practives that should no
> longer be recommanded as it is ambiguous (nothing says that "4ᵉ" is an
> abbreviated ordinal, it can as well be 4 elevated to the power e, or
> various other things).
>
> Also the keys to press on a keyboard is absolutely not a concern: the same
> key presses you propose can as well generate the letter followed by the
> combining abbreviation mark. In fact what you propose is even less
> practical because it uses complex input for all characters and requires
> mapping keys on the whole alphabet (so it uses precious space on the key
> layout). It's just simpler for everyone to press "4", "e", followed by a
> combination (like AltGr+".") to produce the <combining abbreviation mark> !
>
> And these legacy superscript characters still are not warrantied to not
> have any underline (the variation may as well be significant), and there
> will never be enough superscript characters for the many superscript
> notations (not just abbreviations) that should still be encoded the normal
> letters (including in clusters, with diacritics, ligatures and so on):
> Unicode will never accept to reencode all existing letters (plus all the
> infinite set of clusters that can be formed with them) just to turn them
> into superscript/subscript variants. These encodings that found their way
> from the need of roundtrip compatibility of legacy charsets (before the
> UCS) should have never occured at all: these should have not even been
> tolerated for IPA symbols, for mathematical symbols (monospace, bold,
> italic...).
>
> The variation selector solution is also not suitable when the intent is
> only to add semantic to the encoded text and not drive the choice between
> glyph variants (when the default glyph without the variant selector can
> FREELY vary into forms that are UNACCEPTABLE in some contexts, then the
> variation does not really encode the semantic but encodes the visual
> rendering intent: it is too easily abuse to do something else).
> But a single *semantic* combining mark does not encode any visual
> rendering intent like what variation selectors do. They still allow glyphic
> variations as long as the the semantic is kept, and they have the correct
> fallbacks (there's no obscuring of the encoding of the clusters to which
> the semantic combining mark applies: they are still part of the same
> general encoding as normal letters, and rendering abbreviation mark does
> not necessarily means that the base cluster MUST be rendered differently
> than normal letters: it is permitted as well to render the combining mark
> for example as a dot, or as a true diacritic on top of the letters). And if
> needed the following can control the visual appearence:
>
>> >
>> > The <combining abbreviation mark> can also have its own <variation
>> > selector> to select other styles when they are optional, such as
>> > adding underlines to the superscripted letter, or rendering the
>> > letter instead as underscript, or as a small baseline letter with a
>> > dot after it: this is still an explicit abbreviation mark, and the
>> > meaning of the plein text is still preserved: the variation selector
>> > is only suitable to alter the rendering of a cluster when it has
>> > effectively several variants and the default rendering is not
>> > universal, notably across font styles initially designed for specific
>> > markets with their own local preferences: the variation selector
>> > still allows the same fonts to map all known variants distinctly,
>> > independantly of the initial arbitrary choice of the default glyph
>> > used when the variation selector is missing).
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20181103/306a5fa5/attachment.html>
More information about the Unicode
mailing list