Dealing with Unencodeable Characters

Thu Oct 6 13:30:52 CDT 2016

On 10/6/2016 7:54 AM, Charlotte Buff wrote:
> If theoretically I wanted to convert an old Shift JIS document 
> containing emoji to Unicode, how should I ideally handle Shibuya 109?

And the general answer to that is convert to U+FFFD, unless you are 
doing something specific and know what you are doing. ... in which case 
you can use PUA or insert an image, or whatever else you need to do.

This is not a character *standardization* issue that requires the UTC to 
come up with a generic interchange solution for every pre-Unicode 
character encoding of everything that ever was, whether it be some 
oddball Shift JIS extensions that were omitted in the consensus on 
encoding the Japanese Carrier Emoji:

http://www.unicode.org/reports/tr51/tr51-7.html#Japanese_Carrier

or other odds and ends from bizarre, dead-end, disused character 
encodings from a previous generation.

By the way, the biggest ongoing problem we deal with here is the 
continuing urge to proliferate font-encoded hacks for particular 
languages and writing systems. The text interchange problems that such 
schemes pose on an ongoing basis far far outweigh issues like what to do 
with a Shibuya 109 emoji, imo.

--Ken