Unicode block for programming related symbols and codepoints?

Alfred Zett alfred_z at web.de
Sun Feb 8 16:07:52 CST 2015


Hi Jean-Francois Colson,

I hope this doesn't mess up the mailing list.

>>
>> - Indentation codepoint, with no fixed defined graphical 
>> representation. For indentation based programming languages.
>
> That wouldn’t be compliant with existing languages and future 
> languages might use any existing character.

This was for new languages. Creators of future languages mostly orient 
on whatever is available and make sense, so I may make this proposal as 
well, so they don't have to choose the half-assed workarounds they use now.

Also, as long as there is stuff like 
https://github.com/sferik/active_emoji it still makes more sense.

>> Because:
>> -- specific clients may want to show it different (for example as 
>> arrows, lines etc., using another color):
>
> Can’t good editors display tabs in a different color when required ?
Not as reliable and customizable as a special codepoint. For example

>
>> --- browsers could let the web page creator let decide the visual 
>> representation (character and size) via CSS

can't be done and on-the-fly copy and paste conversion with JavaScript 
is horrid and broken for security reasons.
But it's an issue even in good editors as well. You need a lexing plugin 
that may work or not. And the size and other factors are still fixed. 
After all, tabs have whitespace semantics that may appear everywhere in 
the text.

>> --- the same with editors, independent from the actual font
>> --- in case of visual impairment, the user could even change the 
>> accoustical representation if the editor allows it
>> -- unlike a space symbol, it wouldn't need more than one character 
>> per indentation
>> -- unlike tabs or space, it wouldn't be whitespace
>> -- unlike normal arrow characters, one could customize the length in 
>> an editor and wouldn't have to insert extra spaces for a better 
>> visual imagery
>>
>> - A codepoint for string literal quotes, that would spare one the 
>> escaping.
>
> I rarely escape quotes.
> In a text, I use ’ (U+2019) as an apostrophe and «»“”‘’ as quotes, so 
> I don’t need to escape them.
> When I use PHP to generate some HTML code, I try to alternate simple 
> and double quotes as much as possible. That way I rarely need to 
> escape them.
OK, but that's just your scenario. With a language design from the past. 
With probably an editor from the past that allows non-unicode encodings. 
In a better world, manual code point inserting was a last resort.

Imagine someone wants to make his text look like written with a 
typewriter. Or something else.

>
>> - A statement separator symbol.
>
> To replace the semicolon in C and the languages based on its syntax?
Again, for future uses. To be honest, this might sound questionable, but 
this could blur the line between visual line breaks and visual 
characters like semicolons.
Line-break ended comments are separator ended comments.
Of course, that's the least required part of those three proposed 
characters, but I thought for the sake and completeness that shouldn't miss.

Come to think of it, two sets of opening and closing block symbols 
couldn't harm either. And a continue-after-linebreak symbol as well.

>
>> - Other ideas?
>
> Aren’t you trying to reinvent APL?
>
No. APL places a lot of alien-looking, annoying characters to anyone 
except mathematicians into your code that are hard to input. In 
particular from the context.

My proposal on the other hand - if implemented right - introduces some 
really intuitive looking and easy to input characters, because a bold 
arrow at the left doesn't need further explanation and your IDE of the 
future can easily place them when pressing tab in the right position.


More information about the Unicode mailing list