markus.icu at gmail.com
Fri Mar 27 15:56:23 CDT 2015
On Fri, Mar 27, 2015 at 1:27 PM, Michael Norton <
michaelanortonster at gmail.com> wrote:
> Easy example: what's the code for [blank space] U+020 across all language
> sets of Unicode? Is it the same ie: 100%?
I don't understand what you are asking, and I have a hunch you haven't said
it in a way that anyone else understands it either.
The code point value that the Unicode Standard assigns to the normal space
is U+0020, but
- not every language uses spaces
- not every language that uses spaces uses them for the same purpose as
- there are some 30 other "space" characters in Unicode
Statistics of character frequencies vary by corpus, as others have said.
Even if you "only" look "on the web", that's undefined until you specify a
crawling strategy. Dynamically generated content means that there is an
infinite number of "web pages". Every crawler will come up with a different
Maybe you are asking about statistics of character encodings? On the web?
Such as, Unicode vs. Shift-JIS vs. ISO 8859-2 etc.?
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Unicode