Code charts and code points
Jukka K. Korpela
jkorpela at cs.tut.fi
Fri Oct 24 06:51:10 CDT 2014
2014-10-24 11:17, "Martin J. Dürst" wrote:
> The code charts are published as PDFs. In general, text in PDFs can be
> copypasted elsewhere. Is there something in place that makes sure that
> "wrong" Unicode encodings for glyphs published in code charts don't leak
It seems that there isn’t. Whether this is serious is a different issue.
I tested with the arbitrarily chosen Ornamental Dingbats block, with the
Opening it in Adobe Reader XI on Win 7, I was able to select the
characters with the mouse and copy and paste them to a text editor,
BabelPad. It shows most of them as just boxes, identified with the
correct Unicode numbers; this is the expected behavior when the editor
has no suitable font in its disposal. But instead of U+1F67C VERY HEAVY
SOLIDUS and U+1F67D VERY HEAVY REVERSE SOLIDUS, it shows “/” and “/”,
identified as U+002F SOLIDUS and U+005C REVERSE SOLIDUS.
So apparently the font designer had placed the glyphs as assigned to
SOLIDUS and REVERSE SOLIDUS, which is understandable. But this means
that when the characters in the code charts are copied and pasted, or
otherwise accessed at the character level, they are wrong characters.
I think it is imaginable that someone wants to copy a block of
characters from the code charts, as a handy way of getting them for
inspection, e.g. for testing how some particular software renders them
using some particular font(s). I would expect some confusion then if you
had partly got all wrong characters (code points).
More information about the Unicode