Bit arithmetic on Unicode characters?

Richard Wordingham richard.wordingham at ntlworld.com
Fri Oct 7 02:14:07 CDT 2016


On Thu, 6 Oct 2016 21:18:15 -0400
Oren Watson <oren.watson at gmail.com> wrote:

> On Thu, Oct 6, 2016 at 8:28 PM, Richard Wordingham <
> richard.wordingham at ntlworld.com> wrote:

> > Yes, it's a trade-off.  The application I had in mind is converting
> > between mathematical letter variants and their 'plain' forms.
> > Perhaps there is just enough information in the UCD to allow
> > exhaustive, automated tests.

> That application is hindered by the fact that
> 
> ���������������������������������������������� are unallocated
> characters, forming gaps in the otherwise contiguous mathematical
> alphabets.

(Aside: That written statement is illegal! -:)
 
Yep.  It's a known nuisance, which is why I suggested exhaustive tests.
My email client found a font to render U+1D547 as the unwary
would expect, i.e. using a glyph suitable for ℙ U+2119 DOUBLE-STRUCK
CAPITAL P. I was surprised when I first saw those gaps; I would have
expected characters with appropriate singleton decompositions to protect
the unwary.  (The idea might have come up at the time of encoding, and
been dismissed with reasons.)  I don't know whether the font's
misrendering is an accident or is deliberate partial protection of the
victims of bad character code selection.

An old application of arithmetic was transliteration between the
major Indian Indic scripts.  That falls foul of Tamil and of characters
that were not represented in ISCII.

Richard.



More information about the Unicode mailing list