Re: Joined "ti" coded as "Ɵ" in PDF

Leonardo Boiko leoboiko at namakajiri.net
Thu Mar 17 13:06:22 CDT 2016


Yeah, I've stumbled upon this a lot in academic Japanese/Chinese
texts.  I try to copy some Chinese character, only to find out that
it's really a string of random ASCII characters.

Is there only one of those crap PDF pseudo-encodings? If so, I'll use
a conversor next time...

2016-03-17 14:57 GMT-03:00 "Jörg Knappen" <jknappen at web.de>:
> I inspected the pdf file, and its font encoding is termed "Identity-H". I
> couldn't reveal much about this encoding, but it seems to be a private
> encoding of Adobe used especially for Asian fonts.
>
> --Jörg Knappen
>
> Gesendet: Donnerstag, 17. März 2016 um 17:43 Uhr
> Von: "Don Osborn" <dzo at bisharat.net>
> An: unicode at unicode.org
> Betreff: Joined "ti" coded as "Ɵ" in PDF
> Odd result when copy/pasting text from a PDF: For some reason "ti" in
> the (English) text of the document at
> http://web.isanet.org/Web/Conferences/Atlanta%202016/Atlanta%202016%20-%20Full%20Program.pdf
> is coded as "Ɵ". Looking more closely at the original text, it does
> appear that the glyph is a "ti" ligature (which afaik is not coded as
> such in Unicode).
>
> Out of curiosity, did a web search on "internaƟonal" and got over 11k
> hits, apparently all PDFs.
>
> Anyone have any idea what's going on? Am assuming this is not a
> deliberate choice by diverse people creating PDFs and wanting "ti"
> ligatures for stylistic reasons. Note the document linked above is
> current, so this is not (just) an issue with older documents.
>
> Don Osborn



More information about the Unicode mailing list