Usage stats?

Markus Scherer markus.icu at gmail.com
Fri Mar 27 15:56:23 CDT 2015


On Fri, Mar 27, 2015 at 1:27 PM, Michael Norton <
michaelanortonster at gmail.com> wrote:

> Easy example: what's the code for [blank space] U+020 across all language
> sets of Unicode?  Is it the same ie: 100%?
>

I don't understand what you are asking, and I have a hunch you haven't said
it in a way that anyone else understands it either.

The code point value that the Unicode Standard assigns to the normal space
is U+0020, but
- not every language uses spaces
- not every language that uses spaces uses them for the same purpose as
English
- there are some 30 other "space" characters in Unicode

Statistics of character frequencies vary by corpus, as others have said.
Even if you "only" look "on the web", that's undefined until you specify a
crawling strategy. Dynamically generated content means that there is an
infinite number of "web pages". Every crawler will come up with a different
set.

Maybe you are asking about statistics of character encodings? On the web?
Such as, Unicode vs. Shift-JIS vs. ISO 8859-2 etc.?

markus
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20150327/b4f6b676/attachment.html>


More information about the Unicode mailing list