Unicode block for programming related symbols and codepoints?

Frédéric Grosshans frederic.grosshans at gmail.com
Mon Feb 9 08:08:39 CST 2015

Le 09/02/2015 13:55, Alfred Zett a écrit :
>> Additionally, people tend to forget that simply because Unicode is 
>> doing emoji out of compatibility (or other) requirements, it does not 
>> mean that "now anything goes". I refer folks to TR51[1] (specifically 
>> sections 1.3, 8, and Annex C).
>> [1]: http://www.unicode.org/reports/tr51
> You know, the fact that this consortium ever took emoji into 
> consideration immediately justifies to include everything everyone 
> ever wanted. There is no such thing as important data including emoji. :)
The including of emoji was a considerable debate here, with people 
strongly against and strongly for. The trick is that they were already 
used as digital characters by Japanese Telcos and their millions of 
customers. They were de facto encoded as characters in Japanese text 
messages. At the time of encoding, the spread of smartphones made them 
appear in other places (emails, web forums, etc.)

> Jean-Francois Colson:
>> I need a few tens of characters for a conlang I’m developping. ☺ 
> Except two or three control characters don't make a con language.
> Also, if you don't like con languages in Unicode, what's this: 
> http://unicode.org/charts/PDF/U1F700.pdf
I doubt that “not liking con languages” is a faithful description of 
Jean-François ;-)

On a more serious notes, this block is actually a set of “scientific” 
(at his time) notations used by Isaac Newton in its time. They were 
encoded in Unicode following an academic project to digitize his 
manuscripts. So here, you have characters used 3 centuries ago by no 
less than Isaac Newton, most of them having a much longer history, and 
useful for science historians. See 
http://www.unicode.org/L2/L2009/09037r2-alchemy.pdf for details.
This does not compares with a few characters invented for a conlang 
invented by an amateur and used by no one but himself. I think that is 
the point Jean-François wanted to make.

A closer counter-example to Jean-François's “wish” would be Shavian 
(10450..1047F), but this alphabet has shown some use, and I guess that 
its encoding would have been much harder without its association with 
someone as famous as George Berard Shaw or without the existing 
publication of a full text in Shavian.

>> The problem is that Unicode only encodes characters which are 
>> effectively used today or which have been used in the past. It 
>> doesn’t encode characters which could perhaps be used in a 
>> hypothetical new programing language in the future. 
> So you want the font encoding scheme to be a limitating factor for new 
> things?

It is more or less the rule, expt that is not a font encoding, but a 
standard encoding. Once something is encoded , it can never be 
unencoded. And the Unicode standard is built to stay relevant as long as 
possible (decades or centuries). So you ask for your character top be 
encoded in billions of devices for decades. It is more than a mere font 
encoding. There are a few exceptions, but only when a widespread use is 
really expected, like for monetary symbols (it was the case for the Euro).

What you are asking, is a character for an untested idea. You are 
convinced it is useful, but cannot prove anyone beyond yourself will use 
it, hence Jean-François’s parallel with conlangs. In order to have a 
chance of success, design a language using existing characters (e.g. 
some APL + → for TAB) and/or private use codepoints. Once your language 
start gathering steam, come back and argue that using an arrow or a tab 
is awkward, and that U+XXXX SHINY TAB FOR PROGRAMMERS would be an 
improvement for a significant community. I know it is a lot of work, but 
that is probably what it takes.

> Pierpaolo Bernardi:
>> How would your proposed character be displayed as plain text?
> There is no such thing as plain text.
When you say that, you don’t accept the premise of Unicode encoding. 
Unicode’s goal is to encode all plain text characters, but only plain 
text characters.
> Even line breaks and tabs are a matter of interpretation. It's just 
> that they usually have typographic semantics, even in programming 
> editors, with all the side effects.
> In very simple (and with that I mean shitty or not even remotely 
> programming oriented) editors, it may show like a control character, 
> like ␄.
> Browsers and any editor passing the "based on scintilla" complexity 
> mark of course should display something that makes more sense, like an 
> arrow or ⍈ plus surrounding space.

I think everyone her knows what you are saying, and that the notion of 
plain text is a bit fuzzy. But if you cannot argue that your character 
has a meaning in plaint text, for some value of “plain text”, then you 
can not hope for an encoding in Unicode.

More information about the Unicode mailing list