metric for block coverage

Sat Feb 17 16:18:25 CST 2018

Hi!
As a part of Debian fonts team work, we're trying to improve fonts review:
ways to organize them, add metadata, pick which fonts are installed by
default and/or recommended to users, etc.

I'm looking for a way to determine a font's coverage of available scripts. 
It's probably reasonable to do this per Unicode block.  Also, it's a safe
assumption that a font which doesn't know a codepoint can do no complex
shaping of such a glyph, thus looking at just codepoints should be adequate
for our purposes.

A naïve way would be to count codepoints present in the font vs the number
of all codepoints in the block.  Alas, there's way too much chaff for such
an approach to be reasonable: þ or ą count the same as LATIN TURNED CAPITAL
LETTER SAMPI WITH HORNS AND TAIL WITH SMALL LETTER X WITH CARON.

Another idea would be giving every codepoint a weight equal to the number of
languages which currently use such a letter.

Too bad, that wouldn't work for symbols, or for dead scripts: a good runic
font will have a complete coverage of elder futhark, anglo-saxon, younger
and medieval, while only a completionist would care about franks casket or
Tolkien's inventions.

I don't think I'm the first to have this question.  Any suggestions?

ᛗᛖᛟᚹ!
-- 
⢀⣴⠾⠻⢶⣦⠀ 
⣾⠁⢠⠒⠀⣿⡁ A dumb species has no way to open a tuna can.
⢿⡄⠘⠷⠚⠋⠀ A smart species invents a can opener.
⠈⠳⣄⠀⠀⠀⠀ A master species delegates.