APL Under-bar Characters

Ken Whistler kenwhistler at att.net
Tue Aug 18 12:13:15 CDT 2015

On 8/18/2015 9:45 AM, Doug Ewell wrote:
> Ken Whistler <kenwhistler at att dot net> wrote:
> Then we're back to the central point that Alex Weiner originally 
> expressed, in arguing for the encoding of precomposed letters with 
> underbar:
>> The string length functionality would view an 'A' code point combined
>> with an '_' code point as an item that has two elements, while
>> something that looks like 'A'  Should be atomic, and return a length
>> of one.


And instead of pushing for the impossible, the correct solution here
involves dividing and conquering:

1. If the issue is just the *presentation* of legacy APL materials showing
the traditional IBM uppercase italic letters with underscores, then
fix some fonts, use the combining character sequences (or styling,
makes no matter), and edit away with existing characters, and with
no implications for APL implementations.

2. If the issue is *augmentation* of APL implementations to have an
additional A-Z set of character symbols, beyond the upper- and lowercase
ones apparently supported by most APL fonts and implementations,
then pick one of the existing, encoded, mathematical alphabets
and have done with it. There are 13 to choose from! The sans-serif
italic set might make a nice choice. And for the cherry on top, in
the APL fonts, draw a non-connecting underline beneath your
26 new letters to please traditionalists.

The reason to do #2 is that the implementations of APL, because of
the very nature of the language, need their "characters" to have
a fixed size, so that each element of a data array of "characters"
is exactly one "character".

The oopsie for #2, of course, is that if your APL implementation is
actually using 16-bit code *units* for your characters, it is still
stuck in a UCS-2 world, and can't handle UTF-16, because that
once again breaks the ironclad rule that 1 "character" equals
one data element in the array.

The fix for the oopsie is to upgrade the APL implementations to UTF-32.
At that point, the supplementary character problem goes away,
and APL could freely augment its sets of A-Z symbols with the
mathematical alphanumeric symbols without further ado.

What people should *not* be doing is insisting on being stuck
in 1970, as if everybody were still doing APL with IBM Selectric typewriter
terminals hooked up to IBM/360 mainframes using an EBCDIC
APL character set, and that everything in the APL program text
has to look precisely the way it did in 1970.


More information about the Unicode mailing list