Unicode block for programming related symbols and codepoints?

Shervin Afshar shervinafshar at gmail.com
Mon Feb 9 13:23:15 CST 2015


> But then it would be incompatible from IDE to IDE, like Python is
incompatible using 2 spaces, 4 spaces and tabs.
> It's the data that is important, not the software.

Specifically talking about Python, we should not solve what PEP 8[1] is
intended for in Unicode. Pythonistas and their IDEs are encouraged to use
linters to address syntactical discrepancies. This, more or less, applies
to other programming language as well.

[1]: https://www.python.org/dev/peps/pep-0008/#tabs-or-spaces

> You know, the fact that this consortium ever took emoji into
consideration immediately justifies to include everything everyone ever
wanted. There is no such thing as important data including emoji. :)

If you read the background information (in TR51 or elsewhere) on Unicode
emoji, you will see how common and widespread use of PUA by Japanese
providers introduced interoperability issues with the rest of the world.

And no...Addressing that major compatibility/interoperability issue (and
any future issue raised from address that) do not justify inclusion of
"everything everyone ever wanted".


↪ Shervin

On Mon, Feb 9, 2015 at 4:55 AM, Alfred Zett <alfred_z at web.de> wrote:

> OK, I will now try to answer all of you in one mail, otherwise it gets
> hard to overlook...
>
> Shervin Afshar:
>
>> All of the requirements mentioned here can be (and are) implemented in
>> higher levels of software (like IDEs). IMO, there isn't any need for adding
>> new characters to Unicode to address these issues.
>>
> But then it would be incompatible from IDE to IDE, like Python is
> incompatible using 2 spaces, 4 spaces and tabs.
> It's the data that is important, not the software.
>
>>
>> Additionally, people tend to forget that simply because Unicode is doing
>> emoji out of compatibility (or other) requirements, it does not mean that
>> "now anything goes". I refer folks to TR51[1] (specifically sections 1.3,
>> 8, and Annex C).
>>
>> [1]: http://www.unicode.org/reports/tr51
>>
>>  You know, the fact that this consortium ever took emoji into
> consideration immediately justifies to include everything everyone ever
> wanted. There is no such thing as important data including emoji. :)
>
> Jean-Francois Colson:
>
>> I need a few tens of characters for a conlang I’m developping. ☺
>>
> Except two or three control characters don't make a con language.
> Also, if you don't like con languages in Unicode, what's this:
> http://unicode.org/charts/PDF/U1F700.pdf
>
>  The problem is that Unicode only encodes characters which are effectively
>> used today or which have been used in the past. It doesn’t encode
>> characters which could perhaps be used in a hypothetical new programing
>> language in the future.
>>
> So you want the font encoding scheme to be a limitating factor for new
> things?
>
> Pierpaolo Bernardi:
>
>> How would your proposed character be displayed as plain text?
>>
> There is no such thing as plain text.
> Even line breaks and tabs are a matter of interpretation. It's just that
> they usually have typographic semantics, even in programming editors, with
> all the side effects.
>
> In very simple (and with that I mean shitty or not even remotely
> programming oriented) editors, it may show like a control character, like ␄.
>
> Browsers and any editor passing the "based on scintilla" complexity mark
> of course should display something that makes more sense, like an arrow or
> ⍈ plus surrounding space.
>
>  Unicode is a standard for plain text.  If you require a special IDE
>> for your programming language then why use plain text at all?
>>
> Because binary custom encoded databases or blob files are the death of
> interoperability.
>
> Konstantin Ritt:
>
>> Easier than latin1, a layout one could find on [almost] every keyboard?
>> Good luck.
>>
> Also:
>
> Jean-Francois Colson:
>
>> Hard to input? Not harder than the new symbols you’d like to propose.
>> That’s only a matter of keyboard layout and input method.
>>
>
> Indent by pressing tab and insert the literal thing by pressing ". Nothing
> changes, the IDE/editor does the work on the fly.
> Just that you have clean semantics, interoperability and customizability.
>
> Beat that, APL. Where you would >10 key bindings or an annoying software
> keyboard.
>
>  I’ve never used APL so I don’t remember the meanings of its symbols, but
>> couldn’t ⍘ U+2358 APL FUNCTIONAL SYMBOL QUOTE UNDERBAR or ⍞ U+235E APL
>> FUNCTIONAL SYMBOL QUOTE QUAD work as “string litteral quotes” in a new
>> programming language?
>>
> That's a good idea.
>
> That still leaves the indentation character, which is harder than that,
> because one would want a control character with certain semantics.
> E.G.: For programming editors it would make sense to only allow it after
> line breaks and convert other occurences into tabs.
>
>  If the IDE inputs your new character when you press tab, then your new
>> character is a tab…
>>
> Not if it detects the beginning of a line.
>
> Best regards
>
>
> A. Z.
>
> _______________________________________________
> Unicode mailing list
> Unicode at unicode.org
> http://unicode.org/mailman/listinfo/unicode
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20150209/495fbd38/attachment.html>


More information about the Unicode mailing list