Encoding italic (was: A last missing link)

James Kass via Unicode unicode at unicode.org
Sat Jan 19 19:18:19 CST 2019


Victor Gaultney wrote,

 > If however, we say that this "does not adequately consider the harm done
 > to the text-processing model that underlies Unicode", then that exposes a
 > weakness in that model. That may be a weakness that we have to accept for
 > a variety of reasons (technical difficulty, burden on developers, UI 
impact,
 > cost, maturity).

Unicode's character encoding principles and underlying text-processing 
model remain robust.  They are the foundation of modern computer text 
processing.  The goal of ���� �������� ����������¹ needs to accommodate 
the best expectations of the end users and the fact that the consistent 
approach of the model eases the software people's burdens by ensuring 
that effective programming solutions to support one subset or range of 
characters can be applied to the other subsets of the Unicode 
repertoire.  And that those solutions can be shared with other 
developers in a standard fashion.

Assigning properties to characters gives any conformant application 
clear instructions as to what exactly is expected as the app encounters 
each character in a string.  In simpler times, the only expectation was 
that the application would splat a glyph onto a screen (and/or sheet of 
paper) and store a binary string for later retrieval.  We've moved forward.

'Unicode encodes characters, not glyphs' is a core principle. There's a 
legitimate concern whenever anyone is perceived as heading into the 
general direction of turning the character encoding into a glyph 
registry, as it suggests a possible step backwards and might lead to a 
slippery slope.  For example, if italics are encoded, why not fraktur 
and Gaelic?²

The notion that any given system can't be improved is static.³ ("System" 
refers to Unicode's repertoire and coverage rather than its core 
principles.  Core principles are rock solid by nature.)

¹ /ne plus ultra/
² "Conversely, significant differences in writing style for the same 
script may be reflected in the bibliographical classification—for 
example, Fraktur or Gaelic styles for the Latin script. Such stylistic 
distinctions are ignored in the Unicode Standard, which treats them as 
presentation styles of the Latin script."  Ken Whistler, 
http://unicode.org/reports/tr24/
³ "Static" can be interpreted as either virtually catatonic or radio 
noise.  Either is applicable here.



More information about the Unicode mailing list