Feedback on the proposal to change U+FFFD generation when decoding ill-formed UTF-8

Shawn Steele via Unicode unicode at unicode.org
Tue May 16 16:15:53 CDT 2017


> Faster ok, privided this does not break other uses, notably for  random access within strings…

Either way, this is a “recommendation”.  I don’t see how that can provide for not-“breaking other uses.”  If it’s internal, you can do what you will, so if you need the 1:1 seeming parity, then you can do that internally.  But if you’re depending on other APIs/libraries/data source/whatever, it would seem like you couldn’t count on that.  (And probably shouldn’t even if it was a requirement rather than a recommendation).

I’m wary of the idea of attempting random access on a stream that is also manipulating the stream at the same time (decoding apparently).

The U+FFFD emitted by this decoding could also require a different # of bytes to reencode.  Which might disrupt the presumed parity, depending on how the data access was being handled.

-Shawn
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20170516/799e6690/attachment.html>


More information about the Unicode mailing list