jonathan.coxhead at gmail.com
Fri Jan 21 20:29:12 CST 2022
I just updated a web page I once created (now at <
http://twojays.me/unichar-14.0.0.html>), which lists the whole Unicode
repertoire, including all the decompositions, aliases, cross references and
comments. I find this summary page to be the only character reference I
need in my day-to-day life. It's as short as I could make it, though if it
were printed out, it would be about 1000 pages long(!).
But in updating it, I came upon a problem:
Some combining characters are clearly "printing characters", for example,
́ COMBINING ACUTE ACCENT
which can be shown graphically, as above, by displaying it on a space. Some
are control characters, and have no possible visual display, such as
\u034F COMBINING GRAPHEME JOINER
which can only be shown as a code: it has no printable nature at all.
Now, in the case of non-combining characters, this distinction is made
very clearly, as it has been all the way back to the days of the C
isprint() and iscntrl() macros. But for combining characters, the
distinction between printable and control seems not to be made. The only
way I could see to do was to special-case the character names
MONGOLIAN FREE VARIATION SELECTOR (ONE|TWO|THREE|FOUR)
COMBINING GRAPHEME JOINER
TIFINAGH CONSONANT JOINER
BRAHMI NUMBER JOINER
which isn't very satisfactory.
Am I missing something? And if not, should there be something in
UnicodeData.txt that gives me this information?
I was also wondering idly if anyone has any practical uses for the
legacy computing characters, specifically the ones with "BLOCK DIAGONAL" in
the name. They look tantalisingly as though they must be good for
something, but I don't know what it could be—
“*ballads not bombs, songs not surveillance*” —Thom Hartmann
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Unicode