Repertoire

Jonathan Coxhead jonathan.coxhead at gmail.com
Fri Jan 21 20:29:12 CST 2022


Hi Unicodets

   I just updated a web page I once created (now at <
http://twojays.me/unichar-14.0.0.html>), which lists the whole Unicode
repertoire, including all the decompositions, aliases, cross references and
comments. I find this summary page to be the only character reference I
need in my day-to-day life. It's as short as I could make it, though if it
were printed out, it would be about 1000 pages long(!).

   But in updating it, I came upon a problem:

   Some combining characters are clearly "printing characters", for example,

  ́ COMBINING ACUTE ACCENT

which can be shown graphically, as above, by displaying it on a space. Some
are control characters, and have no possible visual display, such as

 \u034F COMBINING GRAPHEME JOINER

which can only be shown as a code: it has no printable nature at all.

   Now, in the case of non-combining characters, this distinction is made
very clearly, as it has been all the way back to the days of the C
isprint() and iscntrl() macros. But for combining characters, the
distinction between printable and control seems not to be made. The only
way I could see to do was to special-case the character names

    VARIATION SELECTOR-[0-9]+
    MONGOLIAN FREE VARIATION SELECTOR (ONE|TWO|THREE|FOUR)
    COMBINING GRAPHEME JOINER
    TIFINAGH CONSONANT JOINER
    BRAHMI NUMBER JOINER

which isn't very satisfactory.

   Am I missing something? And if not, should there be something in
UnicodeData.txt that gives me this information?

   I was also wondering idly if anyone has any practical uses for the
legacy computing characters, specifically the ones with "BLOCK DIAGONAL" in
the name. They look tantalisingly as though they must be good for
something, but I don't know what it could be—

   Cheers

~Jonathan Coxhead

“*ballads not bombs, songs not surveillance*” —Thom Hartmann
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://corp.unicode.org/pipermail/unicode/attachments/20220121/21b89beb/attachment-0001.htm>


More information about the Unicode mailing list