APL Under-bar Characters

Richard Wordingham richard.wordingham at ntlworld.com
Sun Aug 16 13:27:13 CDT 2015


On Sun, 16 Aug 2015 18:53:52 +0200
Khaled Hosny <khaledhosny at eglug.org> wrote:

> On Sun, Aug 16, 2015 at 09:31:25AM -0700, alexweiner at alexweiner.com
> wrote:

> > Now, the ä character has a precomposed form in Unicode, and if you
> > couple that with the NFC normalisation form, you'd get the above
> > _expression_ to return 1.

> > So I'm not sure why the allowance was made for ä as well as other
> > certain characters,  but not for other things (under-bar
> > characters) that face similar representation issues. 

> It was encoded for compatibility of pre-existing character sets AFAIK.

Note that compatibility means allowing habits of treating the
precomposed characters as single characters to continue.  These habits
allowed simple transition, but now cause confusion.  Most rules work
better in NFD than NFC.  For string lengths in NFC, you
immediately lose the rule len(a + b) = len(a) + len(b).  For
NFC, you don't even have len(a + b) <= len(a) + len(b).  However, do
note that for the corresponding 'string' algebra, the mathematical
concept of a string no longer works - and this applies to both NFC and
NFD. Instead, you have to allow for pairs of characters commuting, and
so you get the concept of a 'trace'.

If all combinations of base character and non-spacing marks were
encoded, there'd be infinitely many.  Polytonic Greek has 36
*precomposed* combinations of base character and 3 combining marks, and
some languages frequently use base characters with 4 combining marks;
unexceptional words with 5 combining marks are less frequent.

Richard.



More information about the Unicode mailing list