Usage stats?

Doug Ewell doug at
Sat Mar 28 11:52:35 CDT 2015

Michael Norton wrote:

> Thanks Doug.  I did not know there exists a representative sample of
> the world's text. :)

There is not, which was the point.

Thanks for reposting a private message back to the list, by the way. ��

> Your frequency chart is great.   The average char appearance is 2.91%.
> Only 34% from your list exceed 10% of it.  Therefore, U+0020 is the
> elephant in the room (ie. 15%.05% is far > 2.91%).   In fact, it's
> almost >50% greater than the next most-appearing character.

Words in English are separated by spaces, and the average English word 
is about 5 letters long. It follows that English text will contain a lot 
of spaces. You can eyeball this.

> Only 34% from your list exceed 10% of the average percentile (2.9%).
> This is serendipitously common (eg. the Earth:Moon albedo ratio is
> .36).   A relationship about motion and other natural properties and
> charactetristics among the local texts begin to emerge.


Doug Ewell | | Thornton, CO ���� 

More information about the Unicode mailing list