A last missing link for interoperable representation

Sat Jan 12 13:16:17 CST 2019

On 12/01/2019 00:17, James Kass via Unicode wrote:
[…]
> The fact that the math alphanumerics are incomplete may have been
> part of what prompted Marcel Schneider to start this thread.

No, really not at all. I didn’t even dream of having italics in Unicode
working out of the box. That would exactly be the sort of demand that
would have completely discredited me advocating the use of preformatted
superscripts for the Unicode conformant and interoperable representation
of a handful of languages spoken by one third of mankind and using the
Latin script, while no other scripts are concerned with that orthographic
feature. (No clear borderline between orthography and typography here,
but with ordinal indicators in particular and abbreviation indicators in
general we’re clearly on the orthographic side. (SC2/WG3 would agree,
since they deemed "ª" and "º" worth encoding in 8-bit charsets.)

It started when I found in the XKB keysymdef.h four dead keysyms added
for Karl Pentzlin’s German T3, among which dead_lowline, and remembered
that at some point in history, users were deprived of the means of typing
a combining underscore. I didn’t think at the extra letterspacing (called
“gesperrt” spaced out in German) that Mark E. Shoulson mentioned upthread,
(a) because it isn’t used for that purpose in the locale I’m working for,
and (b) because emulating it with interspersed NARROW NO-BREAK SPACEs
would make that text unsearchable.

> 
> If stringing encoded italic Latin letters into words is an abuse of
> Unicode, then stringing punctuation characters to simulate a "smiley"
> (☺) is an abuse of ASCII - because that's not what those punctuation
> characters are *for*.  If my brain parses such italic strings into
> recognizable words, then I guess my brain is non-compliant.

I think that like Google Search having extensive equivalence classes
treating mathematical letters like plain ASCII, text-to-speech software
could use a little bit of AI to recognize strings of those letters as
ordinary words with emphasis, like James Kass suggested – the more as
we’re actually able to add combining diacritics for correct spelling
in some diacriticized alphabets (including a few with non-decomposable
diacritics), though with somewhat less-than-optimal diacritic placement
in many cases in the actual state of the art – and also parse ASCII art
correspondingly, unlike what happened in another example shared on
Twitter downthread of the math letters tweet:

https://twitter.com/ourelectra/status/1083367552430989315

Thanks,

Marcel