Dedotted I and dotlessi
jameskasskrv at gmail.com
Mon Aug 17 17:31:53 CDT 2020
On 2020-08-17 6:53 PM, Khaled Hosny via Unicode wrote:
> In short, text extraction from PDF is a mess.
Search engines such as Google index text from PDFs and offer PDF links
in the search results. I wonder how Google handles Arabic (and other
complex scripts) PDFs. Have they worked out some kind of method, or are
such PDFs considered non-indexable? Maybe OCR from the display?
More information about the Unicode