Code charts and code points

Jukka K. Korpela jkorpela at cs.tut.fi
Fri Oct 24 06:51:10 CDT 2014


2014-10-24 11:17, "Martin J. Dürst" wrote:

> The code charts are published as PDFs. In general, text in PDFs can be
> copypasted elsewhere. Is there something in place that makes sure that
> "wrong" Unicode encodings for glyphs published in code charts don't leak
> elsewhere?

It seems that there isn’t. Whether this is serious is a different issue.

I tested with the arbitrarily chosen Ornamental Dingbats block, with the 
chart
http://www.unicode.org/charts/PDF/Unicode-7.0/U70-1F780.pdf
Opening it in Adobe Reader XI on Win 7, I was able to select the 
characters with the mouse and copy and paste them to a text editor, 
BabelPad. It shows most of them as just boxes, identified with the 
correct Unicode numbers; this is the expected behavior when the editor 
has no suitable font in its disposal. But instead of U+1F67C VERY HEAVY 
SOLIDUS and U+1F67D VERY HEAVY REVERSE SOLIDUS, it shows “/” and “/”, 
identified as U+002F SOLIDUS and U+005C REVERSE SOLIDUS.

So apparently the font designer had placed the glyphs as assigned to 
SOLIDUS and REVERSE SOLIDUS, which is understandable. But this means 
that when the characters in the code charts are copied and pasted, or 
otherwise accessed at the character level, they are wrong characters.

I think it is imaginable that someone wants to copy a block of 
characters from the code charts, as a handy way of getting them for 
inspection, e.g. for testing how some particular software renders them 
using some particular font(s). I would expect some confusion then if you 
had partly got all wrong characters (code points).

Yucca





More information about the Unicode mailing list