Corner cases (was: Re: UTF-16 Encoding Scheme and U+FFFE)
Steffen Nurpmeso
sdaoden at yandex.com
Fri Jun 6 06:14:47 CDT 2014
"Doug Ewell" <doug at ewellic.org> wrote:
|Philippe Verdy <verdy underscore p at wanadoo dot fr> wrote:
|> Not necessarily true.
|>
|> [602 words]
|
|This has nothing to do with the scenario I described, which involved
|removing a "BOM" from the start of an arbitrary fragment of data,
|thereby corrupting the data because the "BOM" was actually a ZWNBSP.
|
|If you have an arbitrary fragment of data, don't fiddle with it.
|
|If you know enough about the data to fiddle with it safely, it's not
|arbitrary.
Yeah!
E.g., on the all-UTF-8 Plan9 research operating system:
?0[9front.update_bomb_git]$ git ls-files --with-tree=master --|wc -l
44983
?0[9front.update_bomb_git]$ git grep -lI "`print '\ufeff'`" master|wc -l
12
?0[9front.update_bomb_git]$ git grep -lI "`print '\ufeff'`" master
master:9front.hg/lib/font/bit/MAP
master:9front.hg/lib/glass
master:9front.hg/sys/lib/troff/font/devutf/0100to25ff
master:9front.hg/sys/lib/troff/font/devutf/C
master:9front.hg/sys/lib/troff/font/devutf/CW
master:9front.hg/sys/lib/troff/font/devutf/H
master:9front.hg/sys/lib/troff/font/devutf/LucidaSans
master:9front.hg/sys/lib/troff/font/devutf/PA
master:9front.hg/sys/lib/troff/font/devutf/R
master:9front.hg/sys/lib/troff/font/devutf/R.nomath
master:9front.hg/sys/src/ape/lib/utf/runetype.c
master:9front.hg/sys/src/libc/port/runetype.c
--steffen
More information about the Unicode
mailing list