Base64 encoding applied to different unicode texts always yields different base64 texts ... true or false?
Tex via Unicode
unicode at unicode.org
Fri Oct 12 14:26:45 CDT 2018
I agree with Doug. Base64 maps each byte of the source string to unique bytes in the destination string. Decoding is also a unique mapping.
If the encoded string is “translated” in some way by additional processes, canonical or otherwise, then all bets are off.
If you disagree, please offer an example or additional details of how 2 base64 strings might be equivalent.
From: Unicode [mailto:unicode-bounces at unicode.org] On Behalf Of J Decker via Unicode
Sent: Friday, October 12, 2018 9:29 AM
To: doug at ewellic.org
Cc: Unicode Discussion
Subject: Re: Base64 encoding applied to different unicode texts always yields different base64 texts ... true or false?
On Fri, Oct 12, 2018 at 9:23 AM Doug Ewell via Unicode <unicode at unicode.org> wrote:
J Decker wrote:
>> How about the opposite direction: If m is base64 encoded to yield t
>> and then t is base64 decoded to yield n, will it always be the case
>> that m equals n?
> Canonical translation may occur which the different base64 may be the
> same sort of string...
Base64 is a binary-to-text encoding. Neither encoding nor decoding
should presume any special knowledge of the meaning of the binary data,
or do anything extra based on that presumption.
Converting Unicode text to and from base64 should not perform any sort
of Unicode normalization, convert between UTFs, insert or remove BOMs,
etc. This is like saying that converting a JPEG image to and from base64
should not resize or rescale the image, change its color depth, convert
it to another graphic format, etc.
So I'd say "true" to Roger's question.
On the first side (X to base64) definitely true.
But there is potential that text resulting from some decoded buffer is translated, resulting in a 'congruent' string that's not exactly the same... and the base64 will be different.
Comparing some base64 string with some other base64 string shows a binary difference, but may be still the 'same' string.
I touched on this a little bit in UTN #14, from the standpoint of trying
to improve compression by normalizing the Unicode text first.
Doug Ewell | Thornton, CO, US | ewellic.org
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Unicode