Encoding italic (was: A last missing link)

Philippe Verdy via Unicode unicode at unicode.org
Thu Jan 17 07:40:22 CST 2019


If encoding italics means reencoding the normal linguistic usage, it is no
! We already have the nightmares caused by partial encoding of Latin and
Greek (als a few Hebrew characters) for maths notations or IPA notations,
but they are restricted to a well delimited scope of use and subset, and at
least they have relevant scientific sources and auditors for what is needed
in serious publications (Anyway these subsets may continue to evolve but
very slowly).
We could have exceptions added for chemical or electrical notations, if
there are standard bodies supporting them.
But for linguistic usage, there's no universal agreement and no single
authority. Characters are added according to common use (by statistic
survey, or because there are some national standard promoting them and
sometimes making their use mandatory with defined meanings, sometimes
legally binding).
For everything else, languages are not constrained and users around the
world invent their own letterforms, styles: there' no limit at all and if
we start accepting such reencoding, the situation would in fact be worse in
terms of interoperability ,because noone can support zillions variants if
they are not explicitly encoded separately as surrounding styles, or
scoping characters if needed (using contextual characters, possibly variant
selectors if these variants are most often isolated).
But italics encoded as varaint selectors would just pollute everything; and
anyway "italic" is not a single universal convention and does not apply
erqually to all scripts). The semantics attached to italic styles also
varies from document to documents, and the sema semantics also have
different typographic conventions depending on authors, and there's no
agreed meaning bout the distinctions they encode.
For this reason "italique/oblique/cursive/handwriting..." should remain in
styles (note also that even the italic transform can be variable, it could
also be later a subject of user preferences where people may want to adjust
the degree or slanting, according to their reading preferences, or its
orientation if they are left-handed to match how they write themselves, or
if the writer is a native RTL writer; the context of use (in BiDi) may also
adject this slanting orientation, e.g. inserting some Latin in Arabic could
present the Latin italic letters slanted backward, to better match the
slanting of Arabic itself and avoid collisions of Latin and Arabic glyphs
at BiDi boundaries...
One can still propose a contextual control character, but it would still be
insufficient for correctly representing the many stylistic variants
possible: we have better languages to do that now, and CSS (or even HTML)
is better for it (including for accessibility requirements: note that
there's no way to translate corretly these italics to Braille readers for
example; Braille or audio readers attempt to infer an heuristic to reduce
the number of contextual words or symbols they need to insert between each
character, but using VSn characters would complicate that: they are already
processing the standard HTML/CSS conventions to do that much more simply).
direct native encoding of italic characters for lingusitic use would fail
if it only covers English: it would worsen the language coverage if people
are then said to remove the essential diacritics common in their language,
only because of the partial coverage of their alphabet.
I don't think this is worth the effort (and it would in fact cause lot of
maintenance and would severely complicate the addition of new missing
letters; and let's not forget the case of common ligatures, correct
typograhpic features like kerning which would no longer be supported and
would render ugly text if many new kerning pairs are missing in fonts, many
fonts used today would no longer work properly, we would have a reduction
of stylistic options and less fonts usable, and we would fall into the trap
of proprietary solutions with a single provider; it would be too difficult
or any font designer to start defining a usable font sellable on various
market: these fonts would be reduced to niches, and would no longer find a
way to be economically defined and maintained at reasonable cost.
Consider the problems orthogonally: even if you use CSS/HTML styles in
document encoding (rather than the plain text character encoding) you can
also supply the additional semantics clearly in that document, and also
encode the intent of the author, or supply enough info to permit alternate
renderings (for accessibility, or for technical reasons such as small font
sizes on devices will low resolution, or for people with limited vision
capabilities). the same will apply to color (whose meaning is not clear,
except in specific notations supported by wellknown authorities, or by a
long tradition shared by many authors and kept in archives or important
text corpus, such as litterature, legal, and publications that have fallen
to the public domain after their iniçtial publisher disappeared and their
proprietary assets were dissolved: the original documents remain as
reliable sources sharable by many and which can guide the development of
reuse using them as an established convention that many can now reuse
without explaining them too much).
we can repeat this argument to the other common styles : monospaced, bold,
doublestruck, hollow, shadowed, 3D-like, underlining/striking/upperlining,
generic subscripts and superscripts (I don't like the partial encoding of
Latin letters in subscript/superscript working only for basic modern
English, this is an abuse of what was defined mostly for jsut a few
wellknown abbreviation or notations that have a long multilingual
tradition): authors have much more freedom of creation using separate
styles, encoding in an upper-layer protocol.
However we can admit that for use in documents not intended to be rendered
visually, but used technically, we would need some contextual control
characters (just like those for BiDi when HTML/CSS is not usable): these
are just needed for compatibility with technical contraints, provided that
there's an application support for that and such application is not
vendor-specific but sponsored by a wellknown standard (which should then be
explicited in Unicode, probably by character properties, just like
additional properties used for CJK characters specifying the dictionnary
sources). That referenced standard should be open, readable at least by all
(even if it is not republishable), and the standard body should have an
open contact with the community, and regular meetings to solve incoming
issues by defining some policies or the best practices, or the current
"state of the art" (if research is still continuing), as well as some rules
for making the transition and maintaining a good level of compatibility if
this standard evolves or switches to another supported standard.





Le jeu. 17 janv. 2019 à 04:51, James Kass via Unicode <unicode at unicode.org>
a écrit :

>
> Victor Gaultney wrote,
>
>  > Treating italic like punctuation is a win for a lot of people:
>
> Italic Unicode encoding is a win for a lot of people regardless of
> approach.  Each of the listed wins remains essentially true whether
> treated as punctuation, encoded atomically, or selected with VS.
>
>  > My main point in suggesting that Unicode needs these characters is that
>  > italic has been used to indicate specific meaning - this text is somehow
>  > special - for over 400 years, and that content should be preserved in
> plain
>  > text.
>
> ( http://www.unicode.org/versions/Unicode11.0.0/ch02.pdf )
>
> "Plain text must contain enough information to permit the text to be
> rendered legibly, and nothing more."
>
> The argument is that italic information can be stripped yet still be
> read.  A persuasive argument towards encoding would need to negate that;
> it would have to be shown that removing italic information results in a
> loss of meaning.
>
> The decision makers at Unicode are familiar with italic use conventions
> such as those shown in "The Chicago Manual of Style" (first published in
> 1906).  The question of plain-text italics has arisen before on this
> list and has been quickly dismissed.
>
> Unicode began with the idea of standardizing existing code pages for the
> exchange of computer text using a unique double-byte encoding rather
> than relying on code page switching.  Latin was "grandfathered" into the
> standard.  Nobody ever submitted a formal proposal for Basic Latin.
> There was no outreach to establish contact with the user community --
> the actual people who used the script as opposed to the "computer nerds"
> who grew up with ANSI limitations and subsequent ISO code pages.
> Because that's how Unicode rolled back then.  Unicode did what it was
> supposed to do WRT Basic Latin.
>
> When someone points out that italics are used for disambiguation as well
> as stress, the replies are consistent.
>
> "That's not what plain-text is for."  "That's not how plain-text
> works."  "That's just styling and so should be done in rich-text."
> "Since we do that in rich-text already, there's no reason to provide for
> it in plain-text."  "You can already hack it in plain-text by enclosing
> the string with slashes."  And so it goes.
>
> But if variant letter form information is stripped from a string like
> "Jackie Brown", the primary indication that the string represents either
> a person's name or a Tarantino flick title is also stripped.  "Thorstein
> Veblen" is either a dead economist or the name of a fictional yacht in
> the Travis McGee series.  And so forth.
>
> Computer text tradition aside, nobody seems to offer any legitimate
> reason why such information isn't worthy of being preservable in
> plain-text.  Perhaps there isn't one.
>
> I'm not qualified to assess the impact of italic Unicode inclusion on
> the rich-text world as mentioned by David Starner.  Maybe another list
> member will offer additional insight or a second opinion.
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20190117/fede0f7d/attachment.html>


More information about the Unicode mailing list