Aliases for control characters; BELL in particular

Jens Maurer Jens.Maurer at gmx.net
Sat Nov 6 08:00:29 CDT 2021


Hi,

I'm involved in extending the C++ programming language so that
character names can be used to represent a Unicode character in
source code, in addition to code point hex numbers.

There are a number of obstacles here; I'll start with a rather
specific concern.

I'm looking at Unicode 14.0.0.

In section 24.1 it says

Normative Aliases
[...]

Normative aliases which provide information about corrections to defective character
names or which provide alternate names in wide use for a Unicode format character are
printed in the character names list, preceded by a special symbol [...]. Normative aliases
serving other purposes, if listed, are shown by convention in all caps, following an “=”.
Normative aliases of type “figment” for control codes are not listed. Normative aliases
which represent commonly used abbreviations for control codes or format characters are
shown in all caps, enclosed in parentheses. In contrast, informative aliases are shown in
lowercase. For the definitive list of normative aliases, also including their type and suitable
for machine parsing, see NameAliases.txt in the UCD.


https://www.unicode.org/Public/14.0.0/ucd/NameAliases.txt

says, in particular,

# Note that no formal name alias for the ISO 6429 "BELL" is
# provided for U+0007, because of the existing name collision
# with U+1F514 BELL.

0007;ALERT;control
0007;BEL;abbreviation


Yet, https://www.unicode.org/Public/14.0.0/charts/CodeCharts.pdf says

0007 <control>
= BELL

and about a thousand pages later

1F514 BELL
→ 0FC4 tibetan symbol dril bu
→ 2407 symbol for bell
→ 1F56D ringing bell


So, given the explanation in section 24.1, CodeCharts.pdf defines a normative
alias "BELL" for U+0007 (it's all-caps and follows "="), despite the utterance
in NameAliases.txt that this is not desired.
It feels that CodeCharts.pdf ought to say "0007 <control> = ALERT" to avoid
the naming conflict described in the comment in NameAliases.txt.

(It would be good if NameAliases.txt would not use the phrase "formal name alias",
but one of the category phrases from section 24.1.)


A slightly related question is for these aliases from NameAliases.txt:

000A;LINE FEED;control
000A;NEW LINE;control
000A;END OF LINE;control

This seems to indicate that all three aliases are on the same level.

Yet, CodeCharts.pdf says

000A <control>
= LINE FEED (LF)
= new line (NL)
= end of line (EOL)

which, according to the explanation in section 24.1, means that only LINE FEED
is a normative alias, but "new line" and "end of line" are merely informative
aliases. The data in NameAliases.txt does not support this interpretation.
Is it the intention that all three aliases for U+000A are normative aliases?

Thanks for your help!

Jens



More information about the Unicode mailing list