APL Under-bar Characters

Sun Aug 16 20:27:17 CDT 2015

http://unicode.org/policies/stability_policy.html , in particular, the
Normalization Policy. The way the APL A with underscore is encoded is the
way we've been saying, and Unicode has promised its users that there's no
other way of writing it.

The current precedent is that when users ask for things like this is that
they are told they can't have them; for example, the Lithuanians were told
that the way to encode LATIN CAPITAL LETTER A WITH OGONEK AND ACUTE is
U+0104 U+0301, not any other way. They can be listed in
http://www.unicode.org/Public/UCD/latest/ucd/NamedSequences.txt so that
there can be a unique name to refer to them, but there will not be any new
codepoint.

On Sun, Aug 16, 2015 at 6:16 PM <alexweiner at alexweiner.com> wrote:

> David,
>
> I don't understand what you mean by saying that the standard is set. By
> Ken's account, The Consortium decided to create a policy specifically
> regarding this, by vote of APL (and I assume interested Unicode) users
> worldwide. The Standard itself is in version eight. Why does a vote seem so
> ridiculous, especially in the case of an addition, rather than a
> subtraction?
>
> What is the current precedent for this sort of thing?
>
> -Alex
>
> -------- Original Message --------
> Subject: Re: APL Under-bar Characters
>
> From: David Starner <prosfilaes at gmail.com>
> Date: Sun, August 16, 2015 5:59 pm
> To: alexweiner at alexweiner.com, Ken Whistler <kenwhistler at att.net>
> Cc: unicode at unicode.org
>
> The standard is set here. The Unicode Consortium has declared that it
> won't encode precomposed characters that can be created from characters in
> the standard, because that would be destabilizing and potentially introduce
> security holes in programs depending on Unicode. If you want, we can have a
> vote on whether or not APL should use characters with underlines, since I
> was unfairly locked out of that vote by not being born yet.
>
> On Sun, Aug 16, 2015 at 5:52 PM <alexweiner at alexweiner.com> wrote:
>
>> Ken,
>> You pose a very strong, and well worded response. The historical element
>> really helps to illuminate what I thought was lost knowledge: "Why are
>> there no under-bars". To this I can only ask one thing:
>>
>> Can we put this to a vote again? To put things in perspective, I was thee
>> years old at the time of the ballot in 1993 and had much larger issues to
>> deal with (comprehending speech, learning to walk, etc.), and was unable to
>> participate in this internationally binding vote.
>>
>> Perhaps feelings about the under-bar characters have changed since then.
>> I know that the APL landscape is *very* different than it was in 1993.
>>
>> I have a copy of one of those IBM books that has the italicized
>> upper-case under-bars. If my proposal for a new vote is well received,
>> maybe we should include those as well, for completeness sake.
>>
>> -Alex
>>
>>
>> -------- Original Message --------
>> Subject: Re: APL Under-bar Characters
>>
>> From: Ken Whistler <kenwhistler at att.net>
>> Date: Sun, August 16, 2015 5:15 pm
>> To: alexweiner at alexweiner.com
>> Cc: unicode at unicode.org
>>
>> Alex,
>>
>> On 8/16/2015 12:41 PM, alexweiner at alexweiner.com wrote:
>>
>>
>> As far as I know, APL definitely predates the Unicode consortium. Do you
>> think that The Consortium possibly overlooked the pre-existing under-bar
>> character set?
>>
>>
>>
>> The answer to that is no.
>>
>> Initially, Unicode 1.0 attempted to punt the entire APL complex
>> functional symbol
>> problem by encoding U+2300 APL COMPOSE OPERATOR.
>>
>> The concept was essentially that any of the combined symbols -- the old
>> rack of stuff that people complained about entering with
>> symbol/backspace/symbol
>> keying, could simply be represented as sequences of existing symbols.
>> Think of 2300 as an early attempt to introduce an APL "script"-specific
>> conjunct-forming virama, a la much-later artificially introduced
>> script-specific
>> joiners. Cf. U+2D7F TIFINAGH CONSONANT JOINER.
>>
>> But U+2300 APL COMPOSE OPERATOR was an innovation that failed.
>> It was fiercely opposed *by the APL community*, who wanted it
>> out of 10646 and replaced with a explicit list of pre-formed complex
>> functional symbols. Presumably for the same reason we are talking
>> about here now: essentially that each symbol had to work as a "character",
>> and in an APL context that meant fixed width and the same data size as
>> all the other characters.
>>
>> The removal of Unicode 1.0 U+2300 APL COMPOSE OPERATOR is documented
>> in Unicode 1.1 as of 1993:
>>
>> http://www.unicode.org/versions/Unicode1.1.0/
>>
>> (see page 3)
>>
>> The addition of APL functional symbols is documented in Section 5.4.8,
>> pp. 39-41.
>>
>> The exact repertoire that ended up encoded in the standard was the result
>> of meetings
>> between some Unicode representatives and some folks from the APL
>> community. The names
>> escape me at the moment, although it might be possible to recover some
>> information eventually. (Documentation regarding Unicode events in late
>> 1991 is
>> sparse these days.) At any rate the agreed upon additional repertoire is
>> probably
>> that included in:
>>
>> X3L2/92-035, Unicode Request for Additional Characters in ISO/IEC
>> 10646-1.2.
>> And the rest of the consequences and processing can be dug out of the
>> ballot history record
>> for the voting on 10646 in 1992.
>>
>> At any rate, a propos *this* discussion, we agreed that the repertoire
>> would cover
>> all the complex functional symbols, but *not* the letters
>> with underscores. And it is not that they were simply overlooked.
>>
>> How do I know? Well, first, there were APL specialists involved in coming
>> up
>> (and promoting) the repertoire that was carried into the 10646 balloting
>> at
>> the time. It isn't as if a bunch of ignorant Unicoders just grabbed one
>> APL
>> book off the shelf and coded up the table, not noticing that some stuff
>> was
>> missing.
>>
>> Second, the text that is currently in the core specification about this
>> issue,
>> to wit:
>>
>> " ... All other APL extensions can e encoded by composition of other
>> Unicode characters. For example, the APL symbol a underbar can be
>> represented by U+0061 LATIN SMALL LETTER A + U+0332 COMBINING LOW LINE."
>> (Unicode 7.0, Section 22.7, p. 772)
>>
>> is *ancient* text. It was first printed on p. 6-83 of Unicode 2.0 in 1996,
>> with exactly the same wording. And the only reason it took until 1996 to
>> appear,
>> instead of 1993, was that the editing of Unicode 2.0 and its code charts
>> was such a massive task at the time.
>>
>> So the clear intent in *1993* was to represent any APL letter with
>> underbar
>> as a combining character sequence -- as noted. The only problem I see
>> there
>> is that the text in the core spec mistakenly used U+0061 (the lowercase
>> "a")
>> instead of U+0041 (the uppercase "A") for the exemplification.
>>
>> Third, I can attest that at least some of us at the time -- as early as
>> 1989, had
>> printed copies of IBM EBCDIC code page 293 for APL, which had
>> the EBCDIC uppercase Latin letters with underscores (italicized, by the
>> way),
>> together with the regular EBCDIC upper and lowercase letters. [Dates from
>> 1984.]
>> *And* IBM EBCDIC code page 310 for APL, which dropped all the
>> regular upper- and lowercase letters but added more symbols.
>> *And* IBM PC code page 907 (with the underscored uppercase Latin
>> letters) and PC code page 909 (CP437 hacked up for APL, without the
>> underscored uppercase Latin letters), which was quickly superseded by
>> PC code page 910, which also did not use the uppercase Latin letters
>> with underscores.
>>
>> So yeah, we knew about these. Encoding them as combining character
>> sequences instead of as atomic characters was a deliberate decision
>> taken in 1992. And that decision made it through both UTC and
>> international balloting for publication in 1993.
>>
>> --Ken
>>
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20150817/f9bd8ea6/attachment.html>