Joined "ti" coded as "O" in PDF

David Perry hospes02 at
Sat May 7 12:00:31 CDT 2016

I agree that it's a real-world problem -- PDFs really should be 
searchable -- but I do not see that it's a Unicode issue.  Unicode 
defines the basic building blocks of LATIN SMALL LETTER T and LATIN 
SMALL LETTER I; that's its job. Unicode is not responsible for font 
construction or creating PDF software.  Furthermore, even if Unicode did 
want to do something about it, I can't imagine what that could be -- 
aside perhaps from using its bully pulpit to urge PDF creators and font 
creators to do their jobs better.

The fact that some PDF apps do not search and copy/paste text correctly 
when unencoded characters are given PUA values has been known for many 
years.  In the case of Calibri, I looked at the font (version installed 
on my Win7 system) and found that the 'ti' ligature is named t_i, which 
follows good naming practices, and it does not have a PUA assignment. 
Given this, any well-constructed PDF app should be able to decode the 
ligature correctly.


On 5/6/2016 11:49 AM, Steve Swales wrote:
> This discussion seems to have fizzled out, but I’m concerned that
> there’s a real world problem here which is at least partially the
> concern of the consortium, so let me stir the pot and see if there’s
> still any meat left.
> On the current release of MacOS (including the developer beta, for
> your reference, Peter), if you use Calibri font, for example, in any
> app (e.g. notes), to write words with “ti” (like
> internationalization), then press “Print" and “Open PDF in Preview”,
> you get a PDF document with the joined “ti”.  Subsequently cutting and
> pasting produces mojibake, and searching the document for words
> with“ti” doesn’t work, as previously noted.
> I suppose we can look on this as purely a font handling/MacOS bug, but
> I’m wondering if we should be providing accommodations or conveniences
> in Unicode for it to work as desired.
> -steve

More information about the Unicode mailing list