Corner cases (was: Re: UTF-16 Encoding Scheme and U+FFFE)

Steffen Nurpmeso sdaoden at yandex.com
Fri Jun 6 06:14:47 CDT 2014


"Doug Ewell" <doug at ewellic.org> wrote:
 |Philippe Verdy <verdy underscore p at wanadoo dot fr> wrote:
 |> Not necessarily true.
 |>
 |> [602 words]
 |
 |This has nothing to do with the scenario I described, which involved
 |removing a "BOM" from the start of an arbitrary fragment of data,
 |thereby corrupting the data because the "BOM" was actually a ZWNBSP.
 |
 |If you have an arbitrary fragment of data, don't fiddle with it.
 |
 |If you know enough about the data to fiddle with it safely, it's not
 |arbitrary.

Yeah!
E.g., on the all-UTF-8 Plan9 research operating system:

  ?0[9front.update_bomb_git]$ git ls-files --with-tree=master --|wc -l
     44983
  ?0[9front.update_bomb_git]$ git grep -lI "`print '\ufeff'`" master|wc -l
        12
  ?0[9front.update_bomb_git]$ git grep -lI "`print '\ufeff'`" master
  master:9front.hg/lib/font/bit/MAP
  master:9front.hg/lib/glass
  master:9front.hg/sys/lib/troff/font/devutf/0100to25ff
  master:9front.hg/sys/lib/troff/font/devutf/C
  master:9front.hg/sys/lib/troff/font/devutf/CW
  master:9front.hg/sys/lib/troff/font/devutf/H
  master:9front.hg/sys/lib/troff/font/devutf/LucidaSans
  master:9front.hg/sys/lib/troff/font/devutf/PA
  master:9front.hg/sys/lib/troff/font/devutf/R
  master:9front.hg/sys/lib/troff/font/devutf/R.nomath
  master:9front.hg/sys/src/ape/lib/utf/runetype.c
  master:9front.hg/sys/src/libc/port/runetype.c

--steffen


More information about the Unicode mailing list