Corner cases (was: Re: UTF-16 Encoding Scheme and U+FFFE)

Asmus Freytag asmusf at ix.netcom.com
Wed Jun 4 13:40:11 CDT 2014


On 6/4/2014 11:26 AM, Doug Ewell wrote:
> Sorry, I left out an important detail.
>
> I wrote:
>   
>> 3. U+FEFF at the beginning of a stream (note: not "packet" or
>> arbitrary cutoff point)
> I meant U+FEFF as a zero-width no-break space. Obviously it is very
> common to see U+FEFF as a signature or BOM.
>
> My underlying question here is, how common is it that the producer of a
> stream actually intends this character *at the start of a stream* to be
> a ZWNBSP, not to be stripped lest the actual text content be altered?

The semantics of it were chosen at the time to make no sense at the 
start, and to make the character invisible in most situations. The 
remnant of its semantic was later taken up by Word Joiner, so that there 
is now NO use for this as part of text.

The use as part of a convention has always been clear. If you stick this 
at the front, readers will byte-reverse your data; that should weed out 
accidental use pretty quickly :) Or prevent people from getting "cute" 
with it in other ways.

So, I would think that for this particular code point, you can safely 
assume that it's buggy or test data.

Buggy data you just byte reverse as requested and let the user take the 
consequence. :)

A./
>
> --
> Doug Ewell | Thornton, CO, USA
> http://ewellic.org | @DougEwell
>
>
> _______________________________________________
> Unicode mailing list
> Unicode at unicode.org
> http://unicode.org/mailman/listinfo/unicode
>



More information about the Unicode mailing list