Base64 encoding applied to different unicode texts always yields different base64 texts ... true or false?

J Decker via Unicode unicode at unicode.org
Fri Oct 12 11:29:29 CDT 2018


On Fri, Oct 12, 2018 at 9:23 AM Doug Ewell via Unicode <unicode at unicode.org>
wrote:

> J Decker wrote:
>
> >> How about the opposite direction: If m is base64 encoded to yield t
> >> and then t is base64 decoded to yield n, will it always be the case
> >> that m equals n?
> >
> > False.
> > Canonical translation may occur which the different base64 may be the
> > same sort of string...
>
> Base64 is a binary-to-text encoding. Neither encoding nor decoding
> should presume any special knowledge of the meaning of the binary data,
> or do anything extra based on that presumption.
>
> Converting Unicode text to and from base64 should not perform any sort
> of Unicode normalization, convert between UTFs, insert or remove BOMs,
> etc. This is like saying that converting a JPEG image to and from base64
> should not resize or rescale the image, change its color depth, convert
> it to another graphic format, etc.
>
> So I'd say "true" to Roger's question.
>
On the first side (X to base64) definitely true.

But there is potential that text resulting from some decoded buffer is
translated, resulting in a 'congruent' string that's not exactly the
same... and the base64 will be different.

Comparing some base64 string with some other base64 string shows a binary
difference, but may be still the 'same' string.


>
> I touched on this a little bit in UTN #14, from the standpoint of trying
> to improve compression by normalizing the Unicode text first.
>
> --
> Doug Ewell | Thornton, CO, US | ewellic.org
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20181012/ac0ca3bd/attachment.html>


More information about the Unicode mailing list