Unicode block for programming related symbols and codepoints?

Alfred Zett alfred_z at web.de
Mon Feb 9 06:55:02 CST 2015


OK, I will now try to answer all of you in one mail, otherwise it gets 
hard to overlook...

Shervin Afshar:
> All of the requirements mentioned here can be (and are) implemented in 
> higher levels of software (like IDEs). IMO, there isn't any need for 
> adding new characters to Unicode to address these issues.
But then it would be incompatible from IDE to IDE, like Python is 
incompatible using 2 spaces, 4 spaces and tabs.
It's the data that is important, not the software.
>
> Additionally, people tend to forget that simply because Unicode is 
> doing emoji out of compatibility (or other) requirements, it does not 
> mean that "now anything goes". I refer folks to TR51[1] (specifically 
> sections 1.3, 8, and Annex C).
>
> [1]: http://www.unicode.org/reports/tr51
>
You know, the fact that this consortium ever took emoji into 
consideration immediately justifies to include everything everyone ever 
wanted. There is no such thing as important data including emoji. :)

Jean-Francois Colson:
> I need a few tens of characters for a conlang I’m developping. ☺ 
Except two or three control characters don't make a con language.
Also, if you don't like con languages in Unicode, what's this: 
http://unicode.org/charts/PDF/U1F700.pdf

> The problem is that Unicode only encodes characters which are 
> effectively used today or which have been used in the past. It doesn’t 
> encode characters which could perhaps be used in a hypothetical new 
> programing language in the future. 
So you want the font encoding scheme to be a limitating factor for new 
things?

Pierpaolo Bernardi:
> How would your proposed character be displayed as plain text?
There is no such thing as plain text.
Even line breaks and tabs are a matter of interpretation. It's just that 
they usually have typographic semantics, even in programming editors, 
with all the side effects.

In very simple (and with that I mean shitty or not even remotely 
programming oriented) editors, it may show like a control character, like ␄.

Browsers and any editor passing the "based on scintilla" complexity mark 
of course should display something that makes more sense, like an arrow 
or ⍈ plus surrounding space.

> Unicode is a standard for plain text.  If you require a special IDE
> for your programming language then why use plain text at all?
Because binary custom encoded databases or blob files are the death of 
interoperability.

Konstantin Ritt:
> Easier than latin1, a layout one could find on [almost] every 
> keyboard? Good luck.
Also:

Jean-Francois Colson:
> Hard to input? Not harder than the new symbols you’d like to propose. 
> That’s only a matter of keyboard layout and input method. 

Indent by pressing tab and insert the literal thing by pressing ". 
Nothing changes, the IDE/editor does the work on the fly.
Just that you have clean semantics, interoperability and customizability.

Beat that, APL. Where you would >10 key bindings or an annoying software 
keyboard.

> I’ve never used APL so I don’t remember the meanings of its symbols, 
> but couldn’t ⍘ U+2358 APL FUNCTIONAL SYMBOL QUOTE UNDERBAR or ⍞ U+235E 
> APL FUNCTIONAL SYMBOL QUOTE QUAD work as “string litteral quotes” in a 
> new programming language? 
That's a good idea.

That still leaves the indentation character, which is harder than that, 
because one would want a control character with certain semantics.
E.G.: For programming editors it would make sense to only allow it after 
line breaks and convert other occurences into tabs.

> If the IDE inputs your new character when you press tab, then your new 
> character is a tab… 
Not if it detects the beginning of a line.

Best regards

A. Z.



More information about the Unicode mailing list