Joined "ti" coded as "O" in PDF
Andrew Cunningham
lang.support at gmail.com
Sun May 8 03:13:48 CDT 2016
The t_i instance will depend on the quality of the font. If its a standard
ligature there should be a glyph to codepoints assignment in the cmap table
or the ToUnicode mapping in the PDF file.
As David indicates, it isnt a Unicode issue.
It is an issue with the font used and/or the tools used.
PDFs have always been problematic. That isn't going to change anytime soon.
Partly for archiveable or accessible PDFs, the person generating the PDFs
should select the best tools for the job and test the PDF. Then fix any
problems.
Andrew
On Sunday, 8 May 2016, David Perry <hospes02 at scholarsfonts.net> wrote:
> I agree that it's a real-world problem -- PDFs really should be
searchable -- but I do not see that it's a Unicode issue. Unicode defines
the basic building blocks of LATIN SMALL LETTER T and LATIN SMALL LETTER I;
that's its job. Unicode is not responsible for font construction or
creating PDF software. Furthermore, even if Unicode did want to do
something about it, I can't imagine what that could be -- aside perhaps
from using its bully pulpit to urge PDF creators and font creators to do
their jobs better.
>
> The fact that some PDF apps do not search and copy/paste text correctly
when unencoded characters are given PUA values has been known for many
years. In the case of Calibri, I looked at the font (version installed on
my Win7 system) and found that the 'ti' ligature is named t_i, which
follows good naming practices, and it does not have a PUA assignment. Given
this, any well-constructed PDF app should be able to decode the ligature
correctly.
>
> David
>
> On 5/6/2016 11:49 AM, Steve Swales wrote:
>>
>> This discussion seems to have fizzled out, but I’m concerned that
>> there’s a real world problem here which is at least partially the
>> concern of the consortium, so let me stir the pot and see if there’s
>> still any meat left.
>>
>> On the current release of MacOS (including the developer beta, for
>> your reference, Peter), if you use Calibri font, for example, in any
>> app (e.g. notes), to write words with “ti” (like
>> internationalization), then press “Print" and “Open PDF in Preview”,
>> you get a PDF document with the joined “ti”. Subsequently cutting and
>> pasting produces mojibake, and searching the document for words
>> with“ti” doesn’t work, as previously noted.
>>
>> I suppose we can look on this as purely a font handling/MacOS bug, but
>> I’m wondering if we should be providing accommodations or conveniences
>> in Unicode for it to work as desired.
>>
>> -steve
>>
>
--
Andrew Cunningham
lang.support at gmail.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20160508/2490809a/attachment.html>
More information about the Unicode
mailing list