Unicode block for programming related symbols and codepoints?

Philippe Verdy verdy_p at wanadoo.fr
Sat Feb 14 07:53:16 CST 2015

But the TAB is still the whitespace character you describe that is accepted
in the programming language using it.
Defining a new codepoint would require the lexical analyzer of these
languages to be modified (you modify those languages).
Clearly, given that the lexiccal items of the programming languages for the
functions you describe are is a very closed subset, you cannot substitute
them. All you describe is a matter of design for the UI of code editors,
which will still scan the edited sources looking for TABs any not your
custom character, in order to display it in a custom way, accoding to
preferences of the programmer.

We are in fact not talking about the character identities (the only
significant identiy here is the identity of the original characters in the
source text, and the code editor will not alter it even if they display it
differently (but they only "display" them, they don't replace them, unless
the progrzmmer effectively makes a change to the source code (such as
reindeting or compressing whitespaces, or using a source code
beautifer/reformatter (which is safe to use in those editors ONLY if these
editors effectively recognize not only the source characters, but also the
syntax of the source language (so not only it must be able to read and scan
te source, but it must also know which programming language you are using
(generally it uses the file extension of the source file, but if you have
still not given a filename to your source by saving it (or by adding a nod
eto yuor source tree in your IDE), you can still select the programming
language in the menu of the editor.

The same editor can then present the source program in any convenient
presentation that matches the expectations and needs of the programmers
using it: it will typically provide syntax coloring, it will group/ungroup
blocks of source lines (by detecting the syntax used to delimit blocks
(punctuations, begin/end keywords,indentation, statement separators or
operators, priority orders of operators...)

The presentation made will never depend of your new "character" (and a new
symbolic character is not the unique and best way to present the
programming structure because the needs for progrzammers is at a higher
level than isolated characters, but based on the upper-level parsing
seyntax of programming blocks, statements and operations: the program can
then be presented in a treeview listing nodes with sorted listed of
properties, where property values can also be another tree). The tree is
also not the only option: you could as well have rectangular blocks that
you can expand/reduce, appearing as multine blocks of rich text containing
other blocks. Additionally there could be several superposed structures
that are not hierarchically embedded (e.g. one for a line-base
preprocessor, another for the code as it would be understtod by the next
layer, after the preprocessing layer)

And even in programminag languages, there exists structures that do not
obey the hierarchic structure (e.g. SGML and HTML where elements can rreely
close the scope of extension of /many/ previously opened /blocks/, and not
just the one that is in the top of stack When you close an eement that is
not at the top of stack, the existing top of stack /may/ remain at the top
of stack, or could be closed implicitly, according to complex matching
rules (which depend of properties of all elements in the stack between the
element you are explciitly closing and the element at top of stck)

2015-02-08 23:02 GMT+01:00 Jean-François Colson <jf at colson.eu>:

> Le 08/02/15 22:32, Pierpaolo Bernardi a écrit :
> > On Sun, Feb 8, 2015 at 9:15 PM, Alfred Zett <alfred_z at web.de> wrote:
> > […]
> >
> > -- unlike tabs or space, it wouldn't be whitespace
> > […]
> >
> > a Tab is exactly what you described.
> Not exactly: a tab IS whitespace.
> It may sometimes be displayed in a different color or with a special
> symbol on request if the editor allows it, but in most cases it is
> whitespace.
> _______________________________________________
> Unicode mailing list
> Unicode at unicode.org
> http://unicode.org/mailman/listinfo/unicode
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20150214/4430d3ae/attachment.html>

More information about the Unicode mailing list