Split a UTF-8 multi-octet sequence such that it cannot be unambiguously restored?

J Decker via Unicode unicode at unicode.org
Mon Jul 24 14:12:06 CDT 2017


On Mon, Jul 24, 2017 at 10:57 AM, Costello, Roger L. via Unicode <
unicode at unicode.org> wrote:

> Hi Folks,
>
> 2. (Bug) The sending application performs the folding process - inserts
> CRLF plus white space characters - and the receiving application does the
> unfolding process but doesn't properly delete all of them.
>
> The RFC doesn't say 'characters' but either a space or a tab character
(singular)

 back scanning is simple enough

while( ( from[0] & 0xC0 ) == 0x80 )
from--;

should probably also check that from > (start+1) but since it should be
applied at 75-ish characters, that would be implicitly true.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20170724/e8e5f676/attachment.html>


More information about the Unicode mailing list