Dealing with Unencodeable Characters
kenwhistler at att.net
Thu Oct 6 13:30:52 CDT 2016
On 10/6/2016 7:54 AM, Charlotte Buff wrote:
> If theoretically I wanted to convert an old Shift JIS document
> containing emoji to Unicode, how should I ideally handle Shibuya 109?
And the general answer to that is convert to U+FFFD, unless you are
doing something specific and know what you are doing. ... in which case
you can use PUA or insert an image, or whatever else you need to do.
This is not a character *standardization* issue that requires the UTC to
come up with a generic interchange solution for every pre-Unicode
character encoding of everything that ever was, whether it be some
oddball Shift JIS extensions that were omitted in the consensus on
encoding the Japanese Carrier Emoji:
or other odds and ends from bizarre, dead-end, disused character
encodings from a previous generation.
By the way, the biggest ongoing problem we deal with here is the
continuing urge to proliferate font-encoded hacks for particular
languages and writing systems. The text interchange problems that such
schemes pose on an ongoing basis far far outweigh issues like what to do
with a Shibuya 109 emoji, imo.
More information about the Unicode