Encoding italic

Philippe Verdy via Unicode unicode at unicode.org
Mon Feb 11 05:47:54 CST 2019

Le dim. 10 févr. 2019 à 16:42, James Kass via Unicode <unicode at unicode.org>
a écrit :

> Philippe Verdy wrote,
>  >> ...[one font file having both italic and roman]...
>  > The only case where it happens in real fonts is for the mapping of
>  > Mathematical Symbols which have a distinct encoding for some
>  > variants ...
> William Overington made a proof-of-concept font using the VS14 character
> to access the italic glyphs which were, of course, in the same real
> font.  Which means that the developer of a font such as Deja Vu Math TeX
> Gyre could set up an OpenType table mapping the Basic Latin in the font
> to the italic math letter glyphs in the same font using the VS14
> characters.  Such a font would work interoperably on modern systems.
> Such a font would display italic letters both if encoded as math
> alphanumerics or if encoded as ASCII plus VS14.  Significantly, the
> display would be identical.
>  > ...[math alphanumerics]...
>  > These were allowed in Unicode because of their specific contextual
>  > use as distinctive symbols from known standards, and not for general
>  > use in human languages
> They were encoded for interoperability and round-tripping because they
> existed in character sets such as STIX.  They remain Latin letter form
> variants.  If they had been encoded as the variant forms which
> constitute their essential identity it would have broken the character
> vs. glyph encoding model of that era.  Arguing that they must not be
> used other than for scientific purposes is just so much semantic
> quibbling in order to justify their encoding.
> Suppose we started using the double struck ASCII variants on this list
> in order to note Unicode character numbers such as ��+�������� or
> ��+��������?  Hexadecimal notation is certainly math and Unicode can be
> considered a science.  Would that be “math abuse” if we did it?  (Is
> linguistics not a science?)
>  > (because these encodings are defective and don't have the necessary
>  > coverage, notably for the many diacritics,
> The combining diacritics would be used.
Not for the many precombined characters that are in Latin: do you intend to
propose them to be reencoded with all the same variants encoded for maths?
Or allow the maths symbols to have diacritics added on them (hint: this
does not work correctly with the specific mathematical conventions on
diacritics and their specific stacking rules: they are NOT reorderable
through canonical equivalence, the order is significant in maths, so you
would also need to use CGJ to fix the expected logical semantic and visual
stacking order).

>  > case mappings,
> Adjust them as needed.

Not so easy: case mappings cannot be fixed. They are stabilized in Unicode.
You would need special casing rules under a specific "locale" for maths.

Really maths is a specific script even if it borrows some symbols from
Latin, Greek or Hebrew but only in specific glyph variants. These symbols
should not be even considered as part of the script they originate from
(just like Latin A is not the same as Cyrillic A or Greek Alpha, that all
have the same forms and the same origin).

I can argue tyhe same thing about IPA notations: they are NOT the Latin
script and also borrow some letter forms from Latin and Greek, but without
any case mappings (only lowercase is used), and also with specific glyph

Both examples are technical notations which do not obey the linguistic
rules and normal processing of the script they originate from. They are
specific "writing systems", unfortunaltely confused within "Unicode
scripts", and then abused.

Note that some Latin letters have been borrowed from IPA too, for use in
African languages, then case mappings were needed: these should have been
reencoded as a plain letter pair with a basic case mapping (not the special
case mapping rules now needed for African languages, such as open o which
looks much like the mirrored c from Latin Roman digits, and open e which
was borrowed from Greek epsilon in lowercase but does not use the uppercase
Greek Epsilon and uses instead another shape, meaning that the Latin open e
should have been encoded as a plain letter pair, distinct from the Greek
epsilon; but IPA already used the epsilon-like symbol...).

At end these exceptions just cause many inconsistancies and complexities.
Applications and libraries cannot adapt easily and are not downward
compatible because stable properties are immutable and specific tailorings
are needed each time in applications: the more we add these exceptions, the
less the standard is easy to adapt and compatibility is much more difficult
to preserve. In summary I don't like at all the dual encodings or encodings
of additional letters that cannot use the normal stable properties (and
this remark is also true for emojis: what a mess ! full of exceptions and
different incoherent encoding models !)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20190211/03d006af/attachment.html>

More information about the Unicode mailing list