Interesting UTF-8 decoder

Mark Davis ☕️ via Unicode unicode at unicode.org
Mon Oct 9 06:16:03 CDT 2017


The paper points out that the input buffer needs to be padded with 3 null
bytes as a precondition.

Mark <https://twitter.com/mark_e_davis>

On Mon, Oct 9, 2017 at 10:57 AM, J Decker via Unicode <unicode at unicode.org>
wrote:

> that's interesting; however it will segfault if the string ends on a
> memory allocation boundary.  will have to make sure strings are always
> allocated with 3 extra bytes.
>
> 2017-10-09 1:37 GMT-07:00 Martin J. Dürst via Unicode <unicode at unicode.org
> >:
>
>> A friend of mine sent me a pointer to
>> http://nullprogram.com/blog/2017/10/06/, a branchless UTF-8 decoder.
>>
>> Regards,   Martin.
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20171009/51f12b94/attachment.html>


More information about the Unicode mailing list