Could U+E0001 LANGUAGE TAG become undeprecated please? There is a good reason why I ask

Sławomir Osipiuk via Unicode unicode at
Wed Feb 12 11:44:56 CST 2020

On Wed, Feb 12, 2020 at 11:28 AM wjgo_10009 at via Unicode <unicode at> wrote:
> I am reminded of the teletext system (with brand names such as Ceefax and Oracle) in the United KIngdom, which was a broadcasting technology introduced in the 1970s and which became very much a part of British culture during the 1980s and 1990s. A digital signal of a special purpose 7-bit character set was broadcast in the vertical blanking interval of a 625 line analogue television signal.
> It seems to me that there could be, in the future, a type of thing that sends out a continuous signal over a wire of, say, a temperature reading at its location, all formatted in several languages. So, no passwords, no input from an end user, just a continuous feeding into The Internet of Things its output, with the numerical value in the messages changed as the temperature changes. This would allow the digits to be expressed in the digits used in the particular script of the particular language used in an individual message.

Teletext had a data rate of 7 kilobits/s (less than 1 kilobyte/s), was cleverly grafted onto a system never designed for it, and the terminals to display it couldn't handle modern markup. Language tags, or something very like them, would make sense for very low-rate transmissions like Teletext (or the similar Line 21 closed captions in NTSC). It's too late for them, though.

The proposal is for "Internet of Things". In 2020, 1kpbs transmissions are laughably slow, unless you're talking to the Voyager space probes. Receiving equipment, even at the lowest end, has more than enough processing power to interpret a proper markup language. If for some reason you really do want to minimize data rate, you're better off with data compression rather than saving bytes by using Unicode language tags instead of XML. The receiving equipment can handle a decompression step at basically no cost (that wasn't true in the 1970s), and markup languages compress very well.

The particular circumstances that would encourage unicode tag characters don't exist today: Razor-thin data rate and miniscule receiver processing power. With the resources we have now, anything done by tag characters can be done BETTER with proper encapsulating protocols and markup.

With all that said, there is no Unicode Police that will come banging on your door if you make a system that uses the tag characters. If you, or anyone, thinks it's the best solution for a particular project, then do it. Deprecation just means, "There are better ways of doing this. Seriously, please look around." And I think that message is still valid.

(This reply may read overly critical, but I'm very much enjoying this discussion.)

Sławomir Osipiuk

More information about the Unicode mailing list