Encoding/Use of pontial unpaired UTF-16 surrogate pair specifiers

Doug Ewell doug at ewellic.org
Sat Jan 30 15:46:39 CST 2016


Chris Jacobs wrote:

>>> UTF16 has no way to define a code point that is D800-DFFF; this is
>>> an issue if I want to apply some sort of encryption algorithm and
>>> still have the result treated as text for transmission and encoding
>>> to other string systems.
>
> This is not an issue at all. You don't have to restrict the input to
> text to be able to generate an output that can be treated as text.

I gathered that J wanted to generate arbitrary output that could be 
interpreted as UTF-16 code units. I admit to being less than 100% sure 
of this.

Certainly there is no shortage of algorithms to map arbitrary byte input 
to text output, usually limited to some subset of ASCII. One interesting 
approach for the Unicode era was Markus Scherer's "Base16k" concept, at 
https://sites.google.com/site/markusicu/unicode/base16k .

--
Doug Ewell | http://ewellic.org | Thornton, CO ���� 



More information about the Unicode mailing list