Split a UTF-8 multi-octet sequence such that it cannot be unambiguously restored?

Doug Ewell via Unicode unicode at unicode.org
Mon Jul 24 17:35:43 CDT 2017


J Decker wrote:

> I generally accepted any utf-8 encoding up to 31 bits though ( since
> I was going from the original spec, and not what was effective limit
> based on unicode codepoint space)

Hey, everybody: Don't do that.

UTF-8 has been constrained to the Unicode code space (maximum U+10FFFF,
four bytes) for almost fourteen years now. 
 
--
Doug Ewell | Thornton, CO, US | ewellic.org



More information about the Unicode mailing list