Joined "ti" coded as "Ɵ" in PDF

Doug Ewell doug at ewellic.org
Thu Mar 17 13:11:44 CDT 2016


Don Osborn wrote:

> Odd result when copy/pasting text from a PDF: For some reason "ti" in
> the (English) text of the document at
> http://web.isanet.org/Web/Conferences/Atlanta%202016/Atlanta%202016%20-%20Full%20Program.pdf
> is coded as "Ɵ". Looking more closely at the original text, it does
> appear that the glyph is a "ti" ligature (which afaik is not coded as
> such in Unicode).

When I copy and paste the PDF text in question into BabelPad, I get:

> Interna��onal Order and the Distribu��on of Iden��ty in 1950 (By
> invita��on only)

The "ti" ligatures are implemented as U+10019F, a Plane 16 private-use
character.

Truncating this character to 16 bits, which is a Bad Thing™, yields
U+019F LATIN CAPITAL LETTER O WITH MIDDLE TILDE. So it looks like either
Don's clipboard or the editor he pasted it into is not fully
Unicode-compliant.

Don's point about using alternative characters to implement ligatures,
thereby messing up web searches, remains valid.

--
Doug Ewell | http://ewellic.org | Thornton, CO ����




More information about the Unicode mailing list