Bit arithmetic on Unicode characters?

Philippe Verdy verdy_p at wanadoo.fr
Thu Oct 6 16:07:00 CDT 2016


As far as we know, arithmetic is performed only in
- subsets of decimal digits in ASCII and for a dozen of scripts and
converting automatically between them using a single additive constant for
the 10 digits.
- Basic Latin/ASCII for mapping lettercases and mapping non-decimal digits
(adding 6 starting at 10 to use letters A..Z after 0..9)
- the subset of precomposed syllables in Hangul (needed also for checking
canonical equivalences and for the standard NFC/NFD normalizations, and
partly for implementing NFKC/NFKD normalizations and collation).
- in all other cases, this is not reliable at all (characters may still be
allocated in unused slots without any relation to case mappings, e.g. for
the slot in the basic Greek alphabet with the final sigma only encoded in
lowercase, or for mapping the Turkic distinction of dotted I and undotted
i): you'll need proper mapping tables.
- for symbols which could benefit of it (such as box-drawing characters),
it is not used, except for Braille patterns, or for mapping between black
and white versions of chess pieces, or mapping between comparable mahjong
tiles series in their basic set (but not necessarily with the same constant
in extended sets, as it would have required allocating them in more columns
than strictly needed), or for ASCII letters with mapping mathematical
variants of Latin letters or RIS symbols or wide variants for CJK.


2016-10-06 21:44 GMT+02:00 Garth Wallace <gwalla at gmail.com>:

> Other than converting between UTFs, is bit arithmetic commonly performed
> on Unicode characters? I was under the impression that it's a rarity if it
> is done at all.
>
> I've been working on a proposal for additional chess symbols used in chess
> problems and variant games, and I've been in communication with the World
> Federation for Chess Composition, which is the international organization
> in charge of chess problems. We have agreement on the repertoire and the
> text of the proposal, but the arrangement of the proposed characters within
> the new block is a sticking point. Some representatives of the WFCC have
> proposed alternate arrangements that assume there will be a need for
> bitwise operations to covert between the existing chess symbols in the
> Miscellaneous Symbols block and related symbols in the new block. I don't
> see the need but maybe I'm missing something.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20161006/cedfc74d/attachment.html>


More information about the Unicode mailing list