Base64 encoding applied to different unicode texts always yields different base64 texts ... true or false?

Philippe Verdy via Unicode unicode at unicode.org
Sat Oct 13 09:45:10 CDT 2018


You forget that Base64 (as used in MIME) does not follow these rules as it
allows multiple different encodings for the same source binary. MIME
actually splits a binary object into multiple fragments at random
positions, and then encodes these fragments separately. Also MIME uses an
extension of Base64 where it allows some variations in the encoding
alphabet (so even the same fragment of the same length may have two disting
encodings).

Base64 in MIME is different from standard Base64 (which never splits the
binary object before encoding it, and uses a strict alphabet of 64 ASCII
characters, allowing no variation). So MIME requires special handling: the
assumpton that a binary message is encoded the same is wrong, but MIME
still requires that this non unique Base64 encoding will be decoded back to
the same initial (unsplitted) binary object (independantly of its size and
independantly of the splitting boundaries used in the transport, which may
change during the transport).

This also applies to the Base64 encoding used in HTTP transport syntax, and
notably in the HTTP/1.1 streaming feature where fragment sizes are also
variable.


Le sam. 13 oct. 2018 à 16:27, Costello, Roger L. via Unicode <
unicode at unicode.org> a écrit :

> Hi Folks,
>
> Thank you for your outstanding responses!
>
> Below is a summary of what I learned. Are there any errors in the summary?
> Is there anything you would add? Please let me know of anything that is not
> clear.   /Roger
>
> 1. While base64 encoding is usually applied to binary, it is also
> sometimes applied to text, such as Unicode text.
>
> Note: Since base64 encoding may be applied to both binary and text, in the
> following bullets I use the more generic term "data". For example, "Data d
> is base64-encoded to yield ..."
>
> 2. Neither base64 encoding nor decoding should presume any special
> knowledge of the meaning of the data or do anything extra based on that
> presumption.
>
> For example, converting Unicode text to and from base64 should not perform
> any sort of Unicode normalization, convert between UTFs, insert or remove
> BOMs, etc. This is like saying that converting a JPEG image to and from
> base64 should not resize or rescale the image, change its color depth,
> convert it to another graphic format, etc.
>
> If you use base64 for encoding MIME content (e.g. emails), the base64
> decoding will not transform the content. The email parser must ensure that
> the content is valid, so the parser might have to transform the content
> (possibly replacing some invalid sequences or truncating), and then apply
> Unicode normalization to render the text. These transforms are part of the
> MIME application and are independent of whether you use base64 or any
> another encoding or transport syntax.
>
> 3. If data d is different than d', then the base64 text resulting from
> encoding d is different than the base64 text resulting from encoding d'.
>
> 4. If base64 text t is different than t', then the data resulting from
> decoding t is different than the data resulting from decoding t'.
>
> 5. For every data d there is exactly one base64 encoding t.
>
> 6. Every base64 text t is an encoding of exactly one data d.
>
> 7. For all data d, Base64_Decode[Base64_Encode[d]] = d
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20181013/6d52d609/attachment.html>


More information about the Unicode mailing list