Base64 encoding applied to different unicode texts always yields different base64 texts ... true or false?

Fri Oct 12 11:17:59 CDT 2018

J Decker wrote:

>> How about the opposite direction: If m is base64 encoded to yield t
>> and then t is base64 decoded to yield n, will it always be the case
>> that m equals n?
>
> False.
> Canonical translation may occur which the different base64 may be the
> same sort of string...

Base64 is a binary-to-text encoding. Neither encoding nor decoding
should presume any special knowledge of the meaning of the binary data,
or do anything extra based on that presumption.

Converting Unicode text to and from base64 should not perform any sort
of Unicode normalization, convert between UTFs, insert or remove BOMs,
etc. This is like saying that converting a JPEG image to and from base64
should not resize or rescale the image, change its color depth, convert
it to another graphic format, etc.

So I'd say "true" to Roger's question.

I touched on this a little bit in UTN #14, from the standpoint of trying
to improve compression by normalizing the Unicode text first.

--
Doug Ewell | Thornton, CO, US | ewellic.org