Re: Joined "ti" coded as "Ɵ" in PDF

Steve Swales steve at swales.us
Thu Mar 17 13:17:12 CDT 2016


Yes, it seems like your mileage varies with the PDF viewer/interpreter/converter.  Text copied from Preview on the Mac replaces the ti ligature with a space.  Certainly not a Unicode problem, per se, but an interesting problem nevertheless.

-steve

> On Mar 17, 2016, at 11:11 AM, Doug Ewell <doug at ewellic.org> wrote:
> 
> Don Osborn wrote:
> 
>> Odd result when copy/pasting text from a PDF: For some reason "ti" in
>> the (English) text of the document at
>> http://web.isanet.org/Web/Conferences/Atlanta%202016/Atlanta%202016%20-%20Full%20Program.pdf
>> is coded as "Ɵ". Looking more closely at the original text, it does
>> appear that the glyph is a "ti" ligature (which afaik is not coded as
>> such in Unicode).
> 
> When I copy and paste the PDF text in question into BabelPad, I get:
> 
>> Interna��onal Order and the Distribu��on of Iden��ty in 1950 (By
>> invita��on only)
> 
> The "ti" ligatures are implemented as U+10019F, a Plane 16 private-use
> character.
> 
> Truncating this character to 16 bits, which is a Bad Thing™, yields
> U+019F LATIN CAPITAL LETTER O WITH MIDDLE TILDE. So it looks like either
> Don's clipboard or the editor he pasted it into is not fully
> Unicode-compliant.
> 
> Don's point about using alternative characters to implement ligatures,
> thereby messing up web searches, remains valid.
> 
> --
> Doug Ewell | http://ewellic.org | Thornton, CO ����
> 
> 




More information about the Unicode mailing list