J. Leslie Turriff jlturriff at centurylink.net
Thu Jun 5 13:22:09 CDT 2014

On Thursday 05 June 2014 12:24:12 Jeff Senn wrote:
> On Jun 5, 2014, at 12:41 PM, Hans Aberg <haberg-1 at telia.com> wrote:
> > On 5 Jun 2014, at 17:46, Jeff Senn <senn at maya.com> wrote:
> >> That is: are identifiers merely sequences of characters or intended to
> >> be comparable as “Unicode strings” (under some sort of compatibility
> >> rule)?
> >
> > In computer languages, identifiers are normally compared only for
> > equality, as it reduces lookup time complexity.
> Well in this case we are talking about parsing a source file and generating
> internal symbols, so the complexity of the comparison operation is a red
> herring.
> The real question is how does the source identifier get mapped into a
> (compiled) symbol.  (e.g. in C++ this is not an obvious operation)
> If your implication is that there should be no canonicalization (the string
> from the source is used as a sequence of characters only directly mapped to
> a symbol), then I predict sticky problems in the future.  The most obvious
> of which is that in some cases I will be able to change the semantics of
> the complied program by (accidentally) canonicalizing the source text (an
> operation, I will point out, that is invisible to the user in many (most?)
> Unicode aware editors).
	So if programmer A uses editor X to write code, and programmer B uses editor 
Y to modify the code, suddenly the compiler might start generating multiple 
symbols for some identifiers, causing compiles to fail for no obvious reason.
	It seems to me that "the complexity of the comparison operation is a red 
herring" is perhaps a naive view;  this would produce a really high 
astonishment factor.


"Disobedience is the true foundation of liberty. The obedient must be 
slaves." --Henry David Thoreau

More information about the Unicode mailing list