Tag characters and in-line graphics (from Tag characters)

Chris idou747 at gmail.com
Thu Jun 4 02:57:33 CDT 2015

> On 4 Jun 2015, at 10:59 am, David Starner <prosfilaes at gmail.com> wrote:
> On Wed, Jun 3, 2015 at 5:46 PM Chris <idou747 at gmail.com <mailto:idou747 at gmail.com>> wrote:
> I personally think emoji should have one, single definitive representation for this exact reason.
> Then you want an image. I don't see what's hard about that.

I already explained why an image and/or HTML5 is not a character. I’ll repeat again. And the world of characters is not limited to emoji.

1. HTML5 doesn’t separate one particular representation (font, size, etc) from the actual meaning of the character. So you can’t paste it somewhere and expect to increase its point size or change its font.
2. It’s highly inefficient in space to drop multi-kilobyte strings into a document to represent one character.
3. The entire design of HTML has nothing to do with characters. So there is no way to process a string of characters interspersed with HTML elements and know which of those elements are a “character”. This makes programatic manipulation impossible, and means most computer applications simply will not allow HTML in scenarios where they expect a list of “characters”.
4. There is no way to compare 2 HTML elements and know they are talking about the same character. I could put some HTML representation of a character in my document, you could put a different one in, and there would absolutely no way to know that they are the same character. Even if we are in the same community and agree on the existence of this character.
5. Similarly, there is no way to search or index html elements. If a HTML document contained an image of a particular custom character, there would be no way to ask google or whatever to find all the documents with that character. Different documents would represent it differently. HTML is a rendering technology. It makes things LOOK a particular way, without actually ENCODING anything about it. The only part of of HTML that is actually searchable in a deterministic fashion is the part that is encoded - the unicode part.

> The community interested in tony the tiger can make decisions like that. 
> That is a hell of a handwave. In practice, you've got a complex decision that's always going to be a bit controversial, and one a decision that most communities won't bother trying to make.

Apparently the world makes decisions all the time without meeting in committee. Strange but true. It’s called making a decision. Facebook have created a lot of emoji characters without consulting any committee and it seems to work fine, albeit restricted to the facebook universe because of a lack of a standard.

> You can’t know because they’re images.
> You can't know because the only obvious equivalence relation is exact image identity. 

Because… there is no standard!! If facebook wants to define 2 emoji images, maybe one is bigger than the other, and yet basically the same, to mean the same thing, then that would be their choice. Since I expect they have a lot of smart people working there, I expect it would work rather well. Just like Microsoft issues courier fonts in different point sizes and we all feel they have made that work fairly well.

You seem to be arguing the nonsense position that if someone for example, made  a snowflake glyph slightly different to the unicode official one, that it is wrong. That of course is nonsense. People can make sensible decisions about this without the unicode committee.

> You can’t iterate over compressed bits. You can’t process them.
> Why not? In any language I know of that has iterators, there would be no problem writing one that iterates over compressed input. If you need to mutate them, that is hard in compressed formats, but a new CPU can store War in Peace in the on-CPU cache.  

You can’t do it because no standard library, programming language, or operating system is set up to iterate over characters of compressed data. So if you want to shift compressed bits around in your app, it will take an awful lot of work, and the bits won’t be recognised by anyone else.

Now if someone wants to define the next version of unicode to be a compressed format, and every platform supports that with standard libraries, computer languages etc, then fine that could work.

Yet again I point out, lots of things MIGHT be possible in the real world IF that is how a standard is formulated. But all the chatter about this or that technology is pie in the sky without that standard.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20150604/edb68488/attachment.html>

More information about the Unicode mailing list