Corner cases (was: Re: UTF-16 Encoding Scheme and U+FFFE)

Doug Ewell doug at
Wed Jun 4 17:48:02 CDT 2014

Richard Wordingham <richard dot wordingham at ntlworld dot com> wrote:

> The example that's usually given [of U+FEFF at the start of a stream]
> is that of a text file sliced into segments to avoid file size limits.
> In these cases, there is the risk that U+FEFF as ZWNBSP will wind up
> at the start of a segment and be stripped.

Nope, that's exactly the case I was excluding when I wrote:

> 3. U+FEFF [as a zero-width no-break space] at the beginning of a
> stream (note: not "packet" or arbitrary cutoff point)

If you are processing arbitrary fragments of a stream, without knowledge
of preceding fragments, as in this example, then you have no business
making *any* changes to that fragment based on interpretation of that
fragment as Unicode text. Your sole responsibilities at that point are
to pass the fragments, intact, from one process to the next, or to
disassemble and reassemble them.

Doug Ewell | Thornton, CO, USA | @DougEwell

More information about the Unicode mailing list