Current support for N'Ko

Andrew Cunningham lang.support at gmail.com
Mon Sep 29 15:13:54 CDT 2014


On 30/09/2014 4:11 AM, "David Starner" <prosfilaes at gmail.com> wrote:
>
> On Fri, Sep 26, 2014 at 4:10 PM, Andrew Cunningham
> <lang.support at gmail.com> wrote:
> > * NEVER try to copy and paste text from PDF. It is a preprint format and
> > should be treated as such.
>
>
> I'd try and cut and paste from print if I could. People are going to
> cut and paste from anything if it saves them a little time. If you
> disable cut and pasting from PDF, those who have easy access to OCR
> may just print to image and OCR it to cut and paste. To say don't do
> this is unproductive.
>

Ok what I should say is that in best case scenario for complex script text
you can copy and paste nd then do post processing on extracted text to get
the actual text. Post processing may involve reordering characters, or
systematic conversions of glyph sequences.

In worse case scenario you get utter garbage you can not reconstruct pdf
files from.

Searching and indexing is even more problematic.

Honestly, for languages I work with it would be quicker and more accurate
in many csses to use OCR (even at 80% accuracy) that cut and paste from PDF.

As I said in previous email results and effectiveness will differ depending
on fonts used and PDF generator used.

PDF was designed for preprint, not archival purposes.

> --
> Kie ekzistas vivo, ekzistas espero.
> _______________________________________________
> Unicode mailing list
> Unicode at unicode.org
> http://unicode.org/mailman/listinfo/unicode
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20140930/8be4aaad/attachment.html>


More information about the Unicode mailing list