Tag characters and in-line graphics (from Tag characters)

Chris idou747 at gmail.com
Tue Jun 2 17:55:27 CDT 2015

I was asking why the glyphs for right arrow ➡ are inconsistent in many sources, through a couple of iterations of unicode. Perhaps I might observe that one of the reasons is there is no technical link between the code and the glyph. I can’t realistically write a display engine that goes to unicode.org <http://unicode.org/> or wherever, and dynamically finds the right standard glyph for unknown codes. This is also manifest in my seeing empty squares □ for characters my platform doesn’t know about. This isn’t the case with XML where I can send someone a random XML document, and there is a standard way to go out there on the internet and check if that XML is conformant. Why shouldn’t there be a standard way to go out on the net and find the canonical glyph for a code? If there was, then non-standard glyphs would fall out of that technology naturally.

So people are talking about all these technologies that are out there, html5, cmap, fonts and so forth, but there is no standard way to construct a list of “characters”, some of which might be non-standard, and be able to embed that ANYWHERE one might reasonably expect characters, have it processed in a normal way as characters, be sent anywhere and understood.

As you point out, "The UCS will not encode characters without a demonstrated usage.”. But there are use cases for characters that don’t meet UCS’s criteria for a world wide standard, but are necessary for more specific use cases, like specialised regional, business, or domain specific situations.

My question is, given that unicode can’t realistically (and doesn’t aim to) encode every possible symbol in the world, why shouldn’t there be an EXTENSIBLE method for encoding, so that people don’t have to totally rearchitect their computing universe because they want ONE non-standard character in their documents?

Right now, what happens if you have a domain or locale requirement for a special character?  Most likely you suffer without it, because even though you could get it to render in some situations (like hand coding some IMGs into your web site), you just know you won’t be able to realistically input it into emails, word documents, spreadsheets, and whatever other random applications on a daily basis.

What I’m saying is it really beyond the unicode consortium’s scope, and/or would it really be a redundant technology to, for example, define a UTF-64 coding format, where 32 bits allow 4 billion businesses and individuals to define their own characters sets (each of up to 4 billion characters), then have standard places on the internet (similar to DNS lookup servers) that can provide anyone with glyphs and fonts for it?

Right now, yes there are cmaps, but no standard way to combine characters from different encodings. No standard way to find the cmap for an unknown encoding. There is HTML5, but that doesn’t produce something that is recognisable as a list of characters that can be processed as such. (If there is an IMG in text, is it a “character” or an illustration in the text? How can you refer to a particular set of characters without having your own web server? How you render that text bigger, with the standard reference glyph without manually searching the internet where to find it? There is a host of problems here).

All these problems look unsolved to me, and they also look like encoding technology problems to me too. What other consortium is out there are working on character encoding problems?

> On 2 Jun 2015, at 7:40 pm, Philippe Verdy <verdy_p at wanadoo.fr> wrote:
> Once again no ! Unicode is a standard for encoding characters, not for encoding some syntaxic element of a glyph definition !
> Your project is out of scope. You still want to reinvent the wheel.
> For creating syntax, define it within a language (which does not need new characters (you're not creating an APL grammar using specific symbols for some operators more or less based on Greek letters and geometric shapes: they are just like mathematic symbols). Programming languages and data languages (Javascript, XML, JOSN, HTML...) and their syntax are encoded themselves in plain text documents using standard characters) and don't need new characters, APL being an exception only because computers or keyboards were produced to facilitate the input (those that don't have such keyboards used specific editors or the APL runtime envitonment that offer an input method for entering programs in this APL input mode).
> Anf again you want the chicken before the egg: have you only ever read the encoding policy ? The UCS will not encode characters without a demonstrated usage. Nothing in what you propose is really used except being proposed only by you, and used only by you for your private use (or with a few of your unknown friends, but this is invisible and unverifiable). Nothing has been published.
> Even for currency symbols (which are an exception to the demonstrated use, only because once they are created they are extremely rapidly needed by lot of people, in fact most people of a region as large as a country, and many other countries that will reference or use it it). But even in this case, what is encoded is the character itself, not the glyph or new characters used to defined the glyph !
> Can you stop proposing out of topic subjects like this on this list ? You are not speaking about Unicode or characters. Another list will be more appropriate. You help no one here because all you want is to change radically the goals of TUS.
> 2015-06-02 11:01 GMT+02:00 William_J_G Overington <wjgo_10009 at btinternet.com <mailto:wjgo_10009 at btinternet.com>>:
> Perhaps the solution to at least some of the various issues that have been discussed in this thread is to define a tag letter z as a code within the local glyph memory requests, as follows.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20150603/46d996fc/attachment.html>

More information about the Unicode mailing list