annotations (was: NamesList.txt as data source)

Marcel Schneider charupdate at orange.fr
Sun Mar 13 12:13:28 CDT 2016


On Sun, 13 Mar 2016 07:55:24 +0100, Janusz S. Bień  wrote:

> For this purpose he wrote also a converter from NamesList format to XML

That goes straight into the direction I suggested past year as a beta feedback item[1], but I never thought that it could be so simple.

> I understand there is no intention to make an official XML version of
the file as it would require changes in Unibook?

The difference however between homemade databases and official ones is that the latter raise much higher expectations. Asmus Freytag outlined in this thread―as well as in his comments on my feedback―that *no* “complete” UCD version, regardless of how complete it effectively might be, can ever meet the assumptions people inevitably would make on it.

Further, experience shows that the actually provided information is way more than most people are able to mentally process. E.g. most online character information providers do not display the formal aliases, so that in the best case some aware users add that information using the comment facility. I donʼt cite any: These are free tools and platforms that must not be criticized.

When we imagine a hypothetical UCD containing detailed information about the usage of any existing language, not only Polish but also Czech, Romanian, Portugese, Vietnamese, Devanagari, Tirhuta, just to cite some few, the result would be a data mass of which I’m not sure that it would pay back the cost induced at collection, nor that it would really be useful.

For the NamesList, the TXT format is superior to XML at least in that, it prevents from forgetting that NamesList.txt is the source of the Code Charts. Not less, not more.

Marcel

[1] http://www.unicode.org/review/pri297/feedback.html
Date/Time: Sat May 2 07:10:09 CDT 2015
   Opt Subject: PRI #297: UnicodeXData.txt
Date/Time: Wed May 6 08:03:04 CDT 2015
   Opt Subject: PRI #297: feedback on XML files



More information about the Unicode mailing list