Corner cases (was: Re: UTF-16 Encoding Scheme and U+FFFE)
Doug Ewell
doug at ewellic.org
Wed Jun 4 13:00:50 CDT 2014
How common is it to see any of the following in real-world Unicode text,
as opposed to code charts and test suites and the like?
1. Unpaired surrogates
2. Noncharacters (besides CLDR data)
3. U+FEFF at the beginning of a stream (note: not "packet" or arbitrary
cutoff point)
I'm not asking whether any of these are recommended or "prohibited" or
whether they are a good idea. I'm asking about actual usage.
--
Doug Ewell | Thornton, CO, USA
http://ewellic.org | @DougEwell
More information about the Unicode
mailing list