Encoding italic

Philippe Verdy via Unicode unicode at unicode.org
Sun Feb 10 07:18:38 CST 2019

Le dim. 10 févr. 2019 à 05:34, James Kass via Unicode <unicode at unicode.org>
a écrit :

> Martin J. Dürst wrote,
>  >> Isn't that already the case if one uses variation sequences to choose
>  >> between Chinese and Japanese glyphs?
>  >
>  > Well, not necessarily. There's nothing prohibiting a font that includes
>  > both Chinese and Japanese glyph variants.
> Just as there’s nothing prohibiting a single font file from including
> both roman and italic variants of Latin characters.

May be but such a fint would not work as intended to display both styles
distinctly with the common use of the italic style: it would have to make a
default choice and you would then need either a special text encoding, or
enabling an OpenType feature (if using OpenType font format) to select the
other style in a non-standard custom way.

The only case where it happens in real fonts is for the mapping of
Mathematical Symbols which have a distinct encoding for some variants (only
for a basic subset of the Latin alphabet, as well as some basic Greek and a
few other letters from other scripts), and this is typically done only in
symbol fonts containing other mathametical symbols, but because of the
specific encoding for such mathematical use. As well we have the variants
registered in Unicode for IPA usage (only lowercase letters, treated as
symbols and not case-paired).

These were allowed in Unicode because of their specific contextual use as
distinctive symbols from known standards, and not for general use in human
languages (because these encodings are defective and don't have the
necessary coverage, notably for the many diacritics, case mappings, and
other linguisitic, segmentation and layout properties).

The same can be said about superscript/subscript variants, bold variants,
monospace variants: they have specific use and not made for general purpose
texts in human languages with their common orthographic conventions: Latin
is a large script and one of the most complex, and it's quite normal that
there are some deviating usages for specific purposes, provided they are
bounded in scope and use.

But what you would like is to extend the whole Latin script (and why not
Greek, Cyrillic, and others) with multiple reencodings for lot of stylistic
variants, and each time a new character or diacritic is encoded it would
have to be encoded multiple times (so you'd break the encoding character
model, and would just complicate the implementation even more, and would
also create new security issues with lot of new confusables, that every
user of Unicode would then have to take into account, and evey application
or library would then need to be updated, and have to include large
datatables to handle them).

As well it would create many conflicts if we used the "VARIATION SELECTOR
n" characters, or would need to permanently assign specific ones for
specific styles; and then rapidly we would no longer have enough "VARIATION
SELECTOR n" selectors in Unicode : we only have 256 of them, only one is
more or less permanently dedicated.

[VS16 is almos compeltely reserved now for distinction between
normal/linguisitic and emoji/colorful variants. The emoji subset in Unicode
is an open set which could expand in the future to tens of thousands
symbols, and will likely cause large work overhaed in CLDR project just to
describe them, one reason for which I think that Emoji character data in
CLDR should be separated in a distinct translation project, with its own
versioning and milestones, and not maintained in sync with the rest of CLDR
data, if we consider how emojis have flooded the CLDR survey discussions,
when this subset has many known issues and inconsistencies and still no
viable encoding model like the "character encoding model" to make it more
consistant, and updatable separately from the rest of the Unicode UCD
releases; in my opinion the emojis in Unicode are still an alpha project in
development and it's too soon to describe them as a "standard" when there
are many other possible way to handle them; these emeojis are just there
now to remlain as "legacy" mappings but won't resist an expected coming new
formal standard about them insterad of the current mess they create now.]
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20190210/d3cc32ca/attachment.html>

More information about the Unicode mailing list