Private Use Area in Use (from Tag characters and in-line graphics (from Tag characters))

Philippe Verdy verdy_p at wanadoo.fr
Wed Jun 3 08:20:14 CDT 2015


Note that copy-pasting from a PDF to another document is very tricky, the
PDF format requires that embedded fonts use precise glyph naming
conventions to map glyphs back to characters, otherwise the Unicode
characters sequences associated to a glyph (or multiple glyphs if they are
ligatured or in complex layouts or with uncommon decorations, or rendered
on a non uniform background, or with glyphs filled with pattern, such as
labels over a photograph or cartographic map) will not be recognized. This
remark about PDFs is also applicable to PostScript documents.

Some PDF readers in that case attempt to perform some OCR (plus dictionary
lookups to fix mis readings) for common glyph forms, but will almost always
fail if the glyphs are too specific such as when they include swashes,
ligatures, or unknown scripts and scripts with complex layouts (such as the
invented script created by William for noting sentences with specific
"characters" with new glyphs, and a specific syntax and specific layout
rules. In other casesn the PDF reader will jsut put in the clipboard only a
bitmap for the selection, and it will be another software that will attempt
to interpret the bitmap with OCR.

The glyph naming conventions are documented in PDF specifications, but many
PDF creators do not follow these rules, and copying text from these PDFs
fails



2015-06-03 15:03 GMT+02:00 Philippe Verdy <verdy_p at wanadoo.fr>:

> This possibly fails because William possibly forgot to embed his font in
> the document itself (or Serif PagePlus forgets to do it when it creates the
> PDF document, and refuses to embed glyphs from the font that are bound to
> Unicode PUAs when it creates the embeded font). However no such problem
> when creating PDFs with MS Office, or via the Adobe Acrobat "printer"
> driver or other printer drivers generating PDF files, including Google
> Cloud Print).
>
> So this could be a misuse of Serif PagePlus when creating the PDF (I don't
> know this software, may be there are options set up that ells it to not
> embed fonts from a list of fonts that the recipient is supposed to have
> installed locally, to save storage space for the document, byt evoiding
> such embedding). Another reason may be that the font is marked as "not
> embeddable" within its exposed properties.
>
> Another reason may be that John tries to open the document with a software
> that does not handle embedded fonts, or that ignores it to use only the
> fonts preinstalled by John in his preferences. And in such case the result
> depends only on fonts preinstalled on his local system (that does not
> include the fonts created by William), or his software is setup to use
> exclusively a specific local "Unicode" font for all PUAs.
>
> (Softwares that behaved in this bad way was old versions of Internet
> Explorer, due to limitation of his text renderers, however this should not
> happen with PDFs, provided you have used a correct plugion version for
> displaying PDF in the browser : if this fails in the browser, download the
> document and view it with Adobe Reader instead of view the plugin: there
> are many PDF plugins on markets that do not support essential features and
> just built to display PDF containing scanned bitmaps, but with very poor
> support of text or vector graphics, or tuned specifically to change the
> document for another device or paper format).
>
> Without citing which softwares are used (and which PDF in the list does
> not load correctly), it is difficult to tell, but for me I have no problems
> with a few docs I saw created by William. So:
>
> NO F = NO FAIL for me.
>
> 2015-06-03 13:38 GMT+02:00 John <idou747 at gmail.com>:
>
>> Yep, I clicked on your document and saw an empty square where your
>> character should be.
>>
>> F = FAIL.
>>
>>>> Chris
>>
>>
>> On Wed, Jun 3, 2015 at 6:30 PM, William_J_G Overington <
>> wjgo_10009 at btinternet.com> wrote:
>>
>>> Private Use Area in Use (from Tag characters and in-line graphics (from
>>> Tag characters))
>>>
>>>
>>> >> That's not agreed upon. I'd say that the general agreement is that
>>> the private ranges are of limited usefulness for some very limited use
>>> cases (such as designing encodings for new scripts).
>>>
>>>
>>> > They are of limited usefulness precisely because it is pathologically
>>> hard to make use of them in their current state of technological evolution.
>>> If they were easy to make use of, people would be using them all the time.
>>> I’d bet good money that if you surveyed a lot of applications where custom
>>> characters are being used, they are not using private use ranges. Now why
>>> would that be?
>>>
>>>
>>> Actually, I have used Private Use Area characters a lot, and, once I had
>>> got used to them, I found them incredibly straightforward to use.
>>>
>>>
>>> I have made fonts that include Private Use Area encodings using the
>>> High-Logic FontCreator program and then used those fonts in Serif PagePlus,
>>> both to produce PDF documents and PNG graphics, as needed for my particular
>>> project at the time.
>>>
>>>
>>> For example,
>>>
>>>
>>> http://forum.high-logic.com/viewtopic.php?f=10&t=2957
>>>
>>>
>>> http://forum.high-logic.com/viewtopic.php?f=10&t=2672
>>>
>>>
>>> William Overington
>>>
>>>
>>>
>>>
>>> 3 June 2015
>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20150603/97407ca5/attachment.html>


More information about the Unicode mailing list