Names for control characters

Whistler, Ken ken.whistler at sap.com
Wed Mar 12 16:26:28 CDT 2014


Per continued:

> I know it's not a name. My question was *why* control characters don't
> *have* names like
> 
>   CONTROL CHARACTER NULL
>   CONTROL CHARACTER START OF HEADING
>   CONTROL CHARACTER START OF TEXT
>   etc.
> 
> It would be so obvious to have it like that, so I assume there is some
> specific reason not to, but I still can't figure it out. For me there is
> not less reason for these characters to have names than any others, so
> for me it's like Linear B characters didn't have names, and I got the
> answer "no problem, they have aliases, so that's OK!" This is just
> strange to me. If names aren't needed, why do almost all characters have
> them?

Ah, so this is a "Why is the sky blue?" kind of question. ;-)
And perhaps the correct response is then a Just So story...

Once upon a time, there was an ISO framework for character
encoding. Officially his name was ISO 2022 Information technology --
Character code structure and extension techniques. But we'll
think of him as the troll that lives under the bridge and just call
him "2022" for short.

Now 2022 had his favorite collection of code points that he
kept in buckets under the bridge. But he was very, very particular
about how he organized his collection. All the code points 00 to 1F
had to go in the bucket labeled "C0", and all the code points 20
to 7E had to go in the bucket labeled "G0" (or "GL" --
sometimes the troll would get confused). He had other, even
bigger code points, too, but we can save those for another story.

2022 said all the code points in the "G0" bucket could get names.
In fact they could get lots of names, if they wanted. So 2022 also
starting collecting sets of characters, where all those names were
written down. Sometimes he would "escape" to one set and admire
all those pretty names, and then he would "escape" to another set
and admire other pretty names. 2022 was a great admirer of
escaping, by the way, as well as pretty names.

But the code points in the "C0" bucket were different. 2022
insisted that those code points weren't like the ones in the
"G0" bucket, and they couldn't have names at all.
Indeed, these were very odd code points -- 2022 called
them "control functions". Sometimes when the troll took one
out of the "C0" bucket and examined it, it did one thing, but
the next time it might do something completely different.
Only 2022's friend, the troll named 6429 living under the next
bridge to the north, really understood what they might be doing from
one week to the next.

One day an aspiring young wizard named Unicode was crossing
the bridge. As an aspiring young wizard, he was rather observant.
And he noticed that there was a troll living under the bridge and
that that troll had stolen all the code points and was hoarding
them in strangely labeled buckets under the bridge. Being a wizard
and all, he knew that it was his duty to slay the troll and free all the
code points. So he set about writing down the appropriate spell in
his brand new spellbook.

Now Unicode was a very egalitarian wizard -- it just seemed right
to him that all code points should be able to have names, and it
would be better if each one had just one, unique name. That way,
none of them would get jealous of all the names some other code
point had acquired, and besides, each code point would know its name
and could come when you called it. So in the first version of
Unicode's spellbook, he wrote the spell down just that way. He
called his spell "Unicode 1.0", because, well, it was his spell,
after all, and the very first complete spell that he would be trying
to use. 00 could be called "NULL" and 01 could be called "START OF
HEADING", just like 20 could be called "SPACE" and 2D could be
called "HYPHEN-MINUS".

You may be wondering why Unicode would use such odd names
for all the code points, but then there is no accounting for the whims 
of wizards, I guess.

Well, once Unicode had finished writing down the "Unicode 1.0" spell,
he started casting it on the troll:

Shazaaaam! Ffffppfft!

To Unicode's surprise, the spell only partly worked, but then fizzled.
The troll had been badly hurt, but he was still limping around under
the bridge, and he still clung tightly to his buckets of code points.

Unicode looked around to see what the problem could be, and
noticed that there was a warlock at the other end of the bridge.
It was an infamous warlock who had taken to calling himself "10646",
and from all appearances he was *also* trying to cast a spell to
kill the troll and free all the code points. Apparently, casting the
two spells at the same time had resulted in interference in the ley lines.
That was why neither spell had fully worked, and was why the troll
2022 was still limping around with his code point buckets.

The wizard Unicode headed across the bridge to speak to the
warlock 10646:

"Look, we both want to slay that troll and free his code points.
Why don't we team up and cast synchronized spells?"

But 10646 was a suspicious warlock. He wasn't sure that *all*
of the code points could be freed safely. Who knows what mischief
they might get up to if left on their own.

"Those code points that the troll keeps in the C0 bucket are very
dangerous," said 10646. "We can't let them just be like
all the others and get ordinary names. After all, they seem to do
different things in alternate weeks, and if we give them regular
names, they might come when we call them, even if they are
doing the wrong things that week."

The wizard Unicode heaved a sigh. That seemed so silly to him.
But after all, it was important to kill the troll and save all the code
points. So he pulled out his quill and scratched lines through all
the names for the code points from the C0 bucket in his spellbook,
and decided he would call the revised spell "Unicode 1.1". It was
only a little different from his first spell -- but it is important to
keep track of these things. Spells can be dangerous things, after all.

"How does this look to you, Master Warlock?" he asked.

And 10646 nodded his cautious approval at the revision.

So then the wizard Unicode and the warlock 10646 started casting
their spells together.

Shazaamaazama! Pockety spoketi! Keeeraack!

The troll 2022 was dead! His buckets fell out of his grasp, and all
the code points were freed! But the ones that rolled out of the C0
bucket didn't have names, because Unicode had scratched out
all of their names in the Unicode 1.1 spell he cast, just so the warlock
10646 wouldn't interfere by casting a counterspell for them.

And that is why control characters don't have names.











More information about the Unicode mailing list