Could U+E0001 LANGUAGE TAG become undeprecated please? There is a good reason why I ask

Sławomir Osipiuk via Unicode unicode at unicode.org
Mon Feb 10 17:14:03 CST 2020


The examples given don't convince me that "higher-level protocols" would not be sufficient.

There are very few messages being sent in the "Internet of Things" that are truly plain-text. Even those that use a text base (as opposed to binary data) are still in some kind of structured computer language, be it HTML, XML, JSON, etc. The intended natural language can be specified using that structure.

Sending multiples of the same message in different languages is really only applicable to broadcast/multicast scenarios, where you have a transmission going out live to multiple recipients who have different language demands. I can't immediately think of any examples where this is done with plain-text only, though I'd be glad to learn about them, if they exist. 

For any peer-to-peer or client-server interaction, as in your password example, it makes more sense to have the recipient request a specific language (e.g. using HTTP's "Accept-Language" header) and the sender to send its message in that language automatically.

As for "concatenation of such plain text sequences" where each sequence is in a different language, I must again ask: Is there a system that actually does this, that does not have a higher-level protocol that can carry metadata about the natural language of the text sequences?

Basically, I doubt Unicode language tags would be useful here because there simply is no Internet-based system that transmits human-readable text, in multiple natural languages, in such a rudimentary way, with no encapsulating protocol or metadata. And I doubt there will be; it seems like such a strange design choice in this day and age. Though I'd be glad to be corrected if someone has an example.

Sławomir Osipiuk





More information about the Unicode mailing list