Tag characters and in-line graphics (from Tag characters)

Tue Jun 2 20:09:09 CDT 2015

On 2015/06/03 07:55, Chris wrote:

> As you point out, "The UCS will not encode characters without a demonstrated usage.”. But there are use cases for characters that don’t meet UCS’s criteria for a world wide standard, but are necessary for more specific use cases, like specialised regional, business, or domain specific situations.

Unicode contains *a lot* of characters for specialized regional, 
business, or domain specific situations.

> My question is, given that unicode can’t realistically (and doesn’t aim to) encode every possible symbol in the world, why shouldn’t there be an EXTENSIBLE method for encoding, so that people don’t have to totally rearchitect their computing universe because they want ONE non-standard character in their documents?

As has been explained, there are technologies that allow you to do (more 
or less) that. Information technology, like many other technologies, 
works best when finding common cases used by many people. Let's look at 
some examples:

Character encodings work best when they are used widely and uniformly. I 
don't know anybody who actually uses all the characters in Unicode 
(except the guys that work on the standard itself). So for each 
individual, a smaller set would be okay. And there were (and are) 
smaller sets, not for individuals, but for countries, regions, scripts, 
and so on. Originally (when memory was very limited), these legacy 
encodings were more efficient overall, but that's no longer the case. So 
everything is moving towards Unicode.

Most Website creators don't use all the features in HTML5. So having 
different subsets for different use cases may seem to be convenient. But 
overall, it's much more efficient to have one Hypertext Markup Language, 
so that's were everybody is converging to.

 From your viewpoint, it looks like having something in between 
character encodings and HTML is what you want. It would only contain the 
features you need, and nothing more, and would work in all the places 
you wanted it to work. Asmus's "inline" text may be something similar.

The problem is that such an intermediate technology only makes sense if 
it covers the needs of lots and lots of people. It would add a third 
technology level (between plain text and marked-up text), which would 
divert energy from the current two levels and make things more complicated.

Up to now, such as third level hasn't emerged, among else because both 
existing technologies were good at absorbing the most important use 
cases from the middle. Unicode continues to encode whatever symbols that 
gain reasonable popularity, so every time somebody has a "real good use 
case" for the middle layer with a symbol that isn't yet in Unicode, that 
use case gets taken away. HTML (or Web technology in general) also 
worked to improve the situation, with technologies such as SVG and Web 
Fonts.

No technology is perfect, and so there are still some gaps between 
character encoding and markup, some of which may in due time eventually 
be filled up, but I don't think a third layer in the middle will emerge 
soon.

Regards,   Martin.