Unicode block for programming related symbols and codepoints?
Frédéric Grosshans
frederic.grosshans at gmail.com
Mon Feb 9 08:08:39 CST 2015
Le 09/02/2015 13:55, Alfred Zett a écrit :
>
>> Additionally, people tend to forget that simply because Unicode is
>> doing emoji out of compatibility (or other) requirements, it does not
>> mean that "now anything goes". I refer folks to TR51[1] (specifically
>> sections 1.3, 8, and Annex C).
>>
>> [1]: http://www.unicode.org/reports/tr51
>>
> You know, the fact that this consortium ever took emoji into
> consideration immediately justifies to include everything everyone
> ever wanted. There is no such thing as important data including emoji. :)
The including of emoji was a considerable debate here, with people
strongly against and strongly for. The trick is that they were already
used as digital characters by Japanese Telcos and their millions of
customers. They were de facto encoded as characters in Japanese text
messages. At the time of encoding, the spread of smartphones made them
appear in other places (emails, web forums, etc.)
>
>
> Jean-Francois Colson:
>> I need a few tens of characters for a conlang I’m developping. ☺
> Except two or three control characters don't make a con language.
> Also, if you don't like con languages in Unicode, what's this:
> http://unicode.org/charts/PDF/U1F700.pdf
I doubt that “not liking con languages” is a faithful description of
Jean-François ;-)
On a more serious notes, this block is actually a set of “scientific”
(at his time) notations used by Isaac Newton in its time. They were
encoded in Unicode following an academic project to digitize his
manuscripts. So here, you have characters used 3 centuries ago by no
less than Isaac Newton, most of them having a much longer history, and
useful for science historians. See
http://www.unicode.org/L2/L2009/09037r2-alchemy.pdf for details.
This does not compares with a few characters invented for a conlang
invented by an amateur and used by no one but himself. I think that is
the point Jean-François wanted to make.
A closer counter-example to Jean-François's “wish” would be Shavian
(10450..1047F), but this alphabet has shown some use, and I guess that
its encoding would have been much harder without its association with
someone as famous as George Berard Shaw or without the existing
publication of a full text in Shavian.
>
>> The problem is that Unicode only encodes characters which are
>> effectively used today or which have been used in the past. It
>> doesn’t encode characters which could perhaps be used in a
>> hypothetical new programing language in the future.
> So you want the font encoding scheme to be a limitating factor for new
> things?
It is more or less the rule, expt that is not a font encoding, but a
standard encoding. Once something is encoded , it can never be
unencoded. And the Unicode standard is built to stay relevant as long as
possible (decades or centuries). So you ask for your character top be
encoded in billions of devices for decades. It is more than a mere font
encoding. There are a few exceptions, but only when a widespread use is
really expected, like for monetary symbols (it was the case for the Euro).
What you are asking, is a character for an untested idea. You are
convinced it is useful, but cannot prove anyone beyond yourself will use
it, hence Jean-François’s parallel with conlangs. In order to have a
chance of success, design a language using existing characters (e.g.
some APL + → for TAB) and/or private use codepoints. Once your language
start gathering steam, come back and argue that using an arrow or a tab
is awkward, and that U+XXXX SHINY TAB FOR PROGRAMMERS would be an
improvement for a significant community. I know it is a lot of work, but
that is probably what it takes.
>
> Pierpaolo Bernardi:
>> How would your proposed character be displayed as plain text?
> There is no such thing as plain text.
When you say that, you don’t accept the premise of Unicode encoding.
Unicode’s goal is to encode all plain text characters, but only plain
text characters.
> Even line breaks and tabs are a matter of interpretation. It's just
> that they usually have typographic semantics, even in programming
> editors, with all the side effects.
>
> In very simple (and with that I mean shitty or not even remotely
> programming oriented) editors, it may show like a control character,
> like ␄.
>
> Browsers and any editor passing the "based on scintilla" complexity
> mark of course should display something that makes more sense, like an
> arrow or ⍈ plus surrounding space.
I think everyone her knows what you are saying, and that the notion of
plain text is a bit fuzzy. But if you cannot argue that your character
has a meaning in plaint text, for some value of “plain text”, then you
can not hope for an encoding in Unicode.
More information about the Unicode
mailing list