Unicode encoding philosophy

William_J_G Overington wjgo_10009 at btinternet.com
Wed Oct 11 13:51:08 CDT 2023


Erik Carvalhal Miller wrote as follows.

> But, you may ask,what about the mathematical Latin and Greek 
> alphanumeric symbols, rich in implied typographical styles including 
> italic?


Thank you for replying.


Yet I did not. As a mathematician I understood the difference of 
approach. I never had any intention of seeking to try to use the 
encoding of those characters as a precedent. That concept got refuted as 
a possible precedent notwithstanding that I had not contemplated trying 
to do that.

> This statefulness applies even to pre‐computer days of metal type:  A 
> compositor about to set a run of italic type would turn to a case of 
> italics from which to pick out the next several glyphs, then turn back 
> to a non‐italic case when that run was complete, rather than serially 
> start and finish using the italic case many times in a row.

If the type were being handset, that may well be true in relation to the 
practical use of typecases, with the compositor perhaps needing to move 
to work at a different table where the typecase or typecases for italic 
glyphs had been placed, the compositor perhaps needing to use separate 
typecases for uppercase letters and punctuation, and for lowercase 
letters and punctuation. I do not know at present how the process worked 
if the compositor were using machine composition using a Linotype 
Machine or a Monotype machine. I learned to handset metal type back in 
the 1960s as I was involved in private press printing as a family hobby. 
Metal type in use was not stateful regarding italics as each piece of 
metal type in a sequence of handset metal type was an independent unit 
and there was no state set anywhere that insisted that the next piece of 
type after an italic character would be an italic character as sometimes 
it would not be, such as after an italicized word had been completed, 
and indeed, except for a very few special founts such as Palace Script, 
the spaces used between words were the same both for the roman fount and 
for the italic fount. For the avoidance of doubt, there were space sorts 
available in the italic case, for convenience when setting a sequence of 
words in italic sorts, yet they were from the same purchase of spacing 
material from the type foundry and were fount-independent too in 
relation to founts of the same point size, except for a few founts such 
as Palace Script that had special angled space sorts.

A modern computer desktop publishing text editor program could allow a 
compositor to switch from roman to italic for a sequence of characters 
and yet allow, as an option, the text to be output to a file as plain 
text with a VS14 character after each of the italicized characters, so 
in practice, if implemented, there need not be any tediousness in 
applying a VS14 character after each text character. I say that allowing 
the VS14 proposal to become encoded would be practical and would not 
make a string of Unicode characters stateful. In days gone by, 
suggesting a character to switch on italics and a character to switch 
off italics was rejected as it would have made Unicode stateful. So, as 
time went by and I learned, a method to achieve the result without 
making Unicode stateful was devised, tested and it worked great, but 
that was rejected too, because it was not stateful!

Actually, my main reason for wanting to be able to encode italics in 
plain text was to be able to transcribe historical texts into plain text 
on a computer, including such things as the title pages of printed 
books.

I consider that, alas, an opportunity for progress has been dismissed 
due to adherence to concepts from long ago that are not relevant in some 
modern usage situations. The capabilities of plain text could be 
improved if that were allowed.

William Overington

Wednesday 11 October 2023

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://corp.unicode.org/pipermail/unicode/attachments/20231011/4636a8c8/attachment-0001.htm>


More information about the Unicode mailing list