Dedotted I and dotlessi
Richard Wordingham
richard.wordingham at ntlworld.com
Mon Aug 17 17:15:50 CDT 2020
On Mon, 17 Aug 2020 20:53:01 +0200
Khaled Hosny via Unicode <unicode at unicode.org> wrote:
> Easier said than done. Even for tools that want to do this, the only
> reliable way is tagging with /ActualText, but this has to be done per
> grapheme cluster as PDF viewers can’t select or highlight parts of
> text tagged with /ActualText, so Arabic excluded since PDF stores
> glyphs in visual order and you don’t want to tag full paragraphs.
That's a nasty bug. Has it been established that negative
(advance)widths are "inconsistent" TrueType and CFF fonts? I woud have
said that a PDF width of -573 was entirely consistent with a TrueType
width of 573.
> In
> case of reordering, you will also need to tag the whole reordered
> sequence as one unit since you can’t tell which glyphs belongs to
> which character any more. People will also complain about increased
> file size, so you will have to do tagging selectively for cases than
> can’t be handled in a different way.
I don't know if it's due to another feature (or even merely a bug), but
I did notice that LibreOffice-exported PDFs swell enormously if one uses
PDF/A to make Indic text extractable. This was with a series of
documents that were at least 90% English (in the Latin script). Zipping
was ineffective.
Richard.
More information about the Unicode
mailing list