Base64 encoding applied to different unicode texts always yields different base64 texts ... true or false?
Costello, Roger L. via Unicode
unicode at unicode.org
Sat Oct 13 09:16:59 CDT 2018
Hi Folks,
Thank you for your outstanding responses!
Below is a summary of what I learned. Are there any errors in the summary? Is there anything you would add? Please let me know of anything that is not clear. /Roger
1. While base64 encoding is usually applied to binary, it is also sometimes applied to text, such as Unicode text.
Note: Since base64 encoding may be applied to both binary and text, in the following bullets I use the more generic term "data". For example, "Data d is base64-encoded to yield ..."
2. Neither base64 encoding nor decoding should presume any special knowledge of the meaning of the data or do anything extra based on that presumption.
For example, converting Unicode text to and from base64 should not perform any sort of Unicode normalization, convert between UTFs, insert or remove BOMs, etc. This is like saying that converting a JPEG image to and from base64 should not resize or rescale the image, change its color depth, convert it to another graphic format, etc.
If you use base64 for encoding MIME content (e.g. emails), the base64 decoding will not transform the content. The email parser must ensure that the content is valid, so the parser might have to transform the content (possibly replacing some invalid sequences or truncating), and then apply Unicode normalization to render the text. These transforms are part of the MIME application and are independent of whether you use base64 or any another encoding or transport syntax.
3. If data d is different than d', then the base64 text resulting from encoding d is different than the base64 text resulting from encoding d'.
4. If base64 text t is different than t', then the data resulting from decoding t is different than the data resulting from decoding t'.
5. For every data d there is exactly one base64 encoding t.
6. Every base64 text t is an encoding of exactly one data d.
7. For all data d, Base64_Decode[Base64_Encode[d]] = d
More information about the Unicode
mailing list