APL Under-bar Characters

Sun Aug 16 19:15:26 CDT 2015

Alex,

On 8/16/2015 12:41 PM, alexweiner at alexweiner.com wrote:
>
> As far as I know, APL definitely predates the Unicode consortium. Do 
> you think that The Consortium possibly overlooked the pre-existing 
> under-bar character set?
>
>

The answer to that is no.

Initially, Unicode 1.0 attempted to punt the entire APL complex 
functional symbol
problem by encoding U+2300 APL COMPOSE OPERATOR.

The concept was essentially that any of the combined symbols -- the old
rack of stuff that people complained about entering with 
symbol/backspace/symbol
keying, could simply be represented as sequences of existing symbols.
Think of 2300 as an early attempt to introduce an APL "script"-specific
conjunct-forming virama, a la much-later artificially introduced 
script-specific
joiners. Cf. U+2D7F TIFINAGH CONSONANT JOINER.

But U+2300 APL COMPOSE OPERATOR was an innovation that failed.
It was fiercely opposed *by the APL community*, who wanted it
out of 10646 and replaced with a explicit list of pre-formed complex
functional symbols. Presumably for the same reason we are talking
about here now: essentially that each symbol had to work as a "character",
and in an APL context that meant fixed width and the same data size as
all the other characters.

The removal of Unicode 1.0 U+2300 APL COMPOSE OPERATOR is documented
in Unicode 1.1 as of 1993:

http://www.unicode.org/versions/Unicode1.1.0/

(see page 3)

The addition of APL functional symbols is documented in Section 5.4.8, 
pp. 39-41.

The exact repertoire that ended up encoded in the standard was the 
result of meetings
between some Unicode representatives and some folks from the APL 
community. The names
escape me at the moment, although it might be possible to recover some
information eventually. (Documentation regarding Unicode events in late 
1991 is
sparse these days.) At any rate the agreed upon additional repertoire is 
probably
that included in:

X3L2/92-035, Unicode Request for Additional Characters in ISO/IEC 10646-1.2.
And the rest of the consequences and processing can be dug out of the 
ballot history record
for the voting on 10646 in 1992.

At any rate, a propos *this* discussion, we agreed that the repertoire 
would cover
all the complex functional symbols, but *not* the letters
with underscores. And it is not that they were simply overlooked.

How do I know? Well, first, there were APL specialists involved in coming up
(and promoting) the repertoire that was carried into the 10646 balloting at
the time. It isn't as if a bunch of ignorant Unicoders just grabbed one APL
book off the shelf and coded up the table, not noticing that some stuff was
missing.

Second, the text that is currently in the core specification about this 
issue,
to wit:

" ... All other APL extensions can e encoded by composition of other
Unicode characters. For example, the APL symbol a underbar can be
represented by U+0061 LATIN SMALL LETTER A + U+0332 COMBINING LOW LINE."
(Unicode 7.0, Section 22.7, p. 772)

is *ancient* text. It was first printed on p. 6-83 of Unicode 2.0 in 1996,
with exactly the same wording. And the only reason it took until 1996 to 
appear,
instead of 1993, was that the editing of Unicode 2.0 and its code charts
was such a massive task at the time.

So the clear intent in *1993* was to represent any APL letter with underbar
as a combining character sequence -- as noted. The only problem I see there
is that the text in the core spec mistakenly used U+0061 (the lowercase "a")
instead of U+0041 (the uppercase "A") for the exemplification.

Third, I can attest that at least some of us at the time -- as early as 
1989, had
printed copies of IBM EBCDIC code page 293 for APL, which had
the EBCDIC uppercase Latin letters with underscores (italicized, by the 
way),
together with the regular EBCDIC upper and lowercase letters. [Dates 
from 1984.]
*And* IBM EBCDIC code page 310 for APL, which dropped all the
regular upper- and lowercase letters but added more symbols.
*And* IBM PC code page 907 (with the underscored uppercase Latin
letters) and PC code page 909 (CP437 hacked up for APL, without the
underscored uppercase Latin letters), which was quickly superseded by
PC code page 910, which also did not use the uppercase Latin letters
with underscores.

So yeah, we knew about these. Encoding them as combining character
sequences instead of as atomic characters was a deliberate decision
taken in 1992. And that decision made it through both UTC and
international balloting for publication in 1993.

--Ken

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20150816/fcf59099/attachment.html>