Input methods at the age of Unicode

Marcel Schneider charupdate at
Thu Jul 16 10:49:45 CDT 2015

On 16 Jul 2015, at 13:12, William_J_G Overington  wrote:

> Hi

> I do not know if it is of interest, but some time ago I produced some pdf files that can each be used as a typecase so as to copy a character from the pdf, then paste into a Unicode-aware wordprocessor or desktop publishing program and then formatted to the desired font and font size.

This is a nice piece of work. If you are using these characters very often, a solution using a Compose tree may be interesting too. It allows to type a sequence of characters available on the keyboard, to obtain the insertion of precomposed characters, punctuation and symbols. I'll insert some suggestions between, and I'm curious to know if you would like them.

> The following might be of particular interest.


To input a letter with macron, it is current to type 'Compose, _' and then the letter. With hacek, there is 'Compose, v' or 'Compose, <' but this is taken for "subscript", so I prefer 'v' and 'V'. You can find 'Compose, c' because of the ISO name of this diacritic, which has been enforced at merger (Unicode called it HACEK, which is the true name). So better is to choose 'v', a mnemonic derived from the shape. For comma below, take 'Compose, <, Comma', and for turned comma above, 'Compose, >, #, Comma' (I'm not quite sure, because I've not yet implemented these ones). But in fact, AFAIK the turned comma above is a preferred glyphic variant of the hacek on the g.


These are easy, you need 'Compose, ^' and 'Compose, v'.


This may be obtained by typing 'Compose, h, o, t' or 'Compose, h, b'.


With dot above is usually 'Compose, Full stop'; and the latin letter h with stroke is 'Compose, -, h'.


You may type 'Compose, Grave' as a grave accent dead key, then go on with 'Apostrophe' or 'Quotation mark' for either single or double opening qoutation marks. Or 'Comose, Apostrophe' for the acute, then equally for the closing. That matches old ASCII practice, hence the mnemonics. For the low, type 'Compose, <', and for the reversed, 'Compose, \'.


There is an ultra-performative way to get *all* Unicode spaces (perhaps without the two doubles) with 'Compose, Space' and then any mnemonic letter, digit (1; 2; 3; 4; 6), and even < or > for the unpaired directional marks (very useful to correct the display when RTL characters are used in a LTR context and vice versa).


For the letters with diaeresis one can use the usual 'Compose, "', or the alternate 'Compose, :'. The latter helps disambiguating the use of quotation marks, because 'Compose, Apostrophe, Quotation mark' is already used for the closing double quote, so "diaeresis and acute" may interfere. For acute, grave, circumflex, we use 'Compose, '/`/^'. (Alternately, if the apostrophe risks to interfere, one can use the vertical bar instead, which is a solution that should have been implemented on the US International keyboard to prevent that "it messes" apostrophe, single quotes, and acute dead key. Instead of the quotation mark for diaeresis, IMO one could have chosen the number sign or some other less often used character. I know that ASCII used ' and " after Backspace to diacrite letters, hence the choice of the dead keys on the US International.) 

> These and some others are linked from the following web page.
> That page is linked from another web page.

I'm confident to extrapolate that for each one of the other PDF typecases, there will be Compose solutions too.
To implement a two characters Compose sequence, program the following:
DEADTRANS(first character, compose, first character, 0x0001),
DEADTRANS(second character, first character, target character, 0x0000)


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

More information about the Unicode mailing list