Usage stats?

Michael Norton michaelanortonster at
Fri Mar 27 16:03:44 CDT 2015

Doug Ewell's getting it.   He sent this back to me, so I asked him if he
could provide the same dataset drawn from his written reply to me:

* For example, your original e-mail (327characters) consists of:U+0020 -
14.07%U+0065 - 10.09%U+0061 -  7.03%U+0074 -  6.73%U+006F -  5.81%*

This is good because when the volumes of traffic begin to exponentially
increase over a space, if there are predominant formulations of Unicode for
each, they need to be recognized for a number of reasons depending on which
sector or, as you say, corpus, you're in.

In the above example, I think it's safe to say U+0020 online, though I
would like to compare with the other 30 "space" characters you mentioned
Markus.   If I know traffic figures for where the other space characters
are used, I can draw a pretty good estimation and correlation of it.

On Fri, Mar 27, 2015 at 4:56 PM, Markus Scherer < at>

> On Fri, Mar 27, 2015 at 1:27 PM, Michael Norton <
> michaelanortonster at> wrote:
>> Easy example: what's the code for [blank space] U+020 across all language
>> sets of Unicode?  Is it the same ie: 100%?
> I don't understand what you are asking, and I have a hunch you haven't
> said it in a way that anyone else understands it either.
> The code point value that the Unicode Standard assigns to the normal space
> is U+0020, but
> - not every language uses spaces
> - not every language that uses spaces uses them for the same purpose as
> English
> - there are some 30 other "space" characters in Unicode
> Statistics of character frequencies vary by corpus, as others have said.
> Even if you "only" look "on the web", that's undefined until you specify a
> crawling strategy. Dynamically generated content means that there is an
> infinite number of "web pages". Every crawler will come up with a different
> set.
> Maybe you are asking about statistics of character encodings? On the web?
> Such as, Unicode vs. Shift-JIS vs. ISO 8859-2 etc.?
> markus


Michael A. Norton, B.A. Cinema, M.P.A.
My Cinema Home:

"All great actors are mere mathematical masters of speech and the human
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

More information about the Unicode mailing list