NamesList.txt as data source

Janusz S. Bień jsbien at mimuw.edu.pl
Mon Mar 28 23:40:02 CDT 2016


On Mon, Mar 28 2016 at 13:59 CEST, mark at macchiato.com writes:

[...]

> But subheads are not Unicode Character Properties.

As it was already said by Doug, nobody claims this.

> And repeating the caveats expressed earlier,

There was a lot of repetitions in this thread...

> the Nameslist data is designed for chart production, not as a reliable
> source of machine-readable data.

I guess you understand "machine-readable data" (and in consequence "data
mining") in a specific very narrow way.

> While it may be in some cases useful to look at, the subheads are not
> designed to be a consistent source of data.

Can we agree that Nameslist is a reliable source of machine-readable
data about the Unicode *charts*?

On Sun, Mar 27 2016 at  6:38 CEST, asmus-inc at ix.netcom.com writes:

[...]

> 3 The information is purely editorial, and as such, changed by the
>   editors as needed, not assigned as result of a vote in the Unicode
>   Technical Committee. 

Changes are not a problem if properly documented, but this is another
topic.

Let's now be more specific:

On Sun, Mar 27 2016 at  5:00 CEST, doug at ewellic.org writes:
> Janusz Bień wrote:
>
>> Am I right that this information is available only in NamesList.txt?
>
> It probably comes from what Ken referred to as "a very long list of
> annotational material, including names list subhead material, etc.,
> maintained in other sources."
>
> If you don't have access to those "other sources," 

See below.

> then as far as I
> can tell, yes, it's available only in NamesList.txt.
>
> --
> Doug Ewell | http://ewellic.org | Thornton, CO ���� 
>
>

On Sun, Mar 27 2016 at  6:38 CEST, asmus-inc at ix.netcom.com writes:
> On 3/26/2016 2:10 AM, Janusz S. "Bień" wrote:

[...]

> I've just noticed that NamesList.txt is in a sense data mined by the
> Unicode consortium itself. I mean the "Unicode Utilities: Character
> Properties", which e.g. for LATIN SMALL LETTER P WITH FLOURISH 
> (http://unicode.org/cldr/utility/character.jsp?a=A753) display in
> particular
>
> subhead: Medievalist addition

[...]

>
> If you seriously wanted to present "all that is known about a
> character" you would need to excerpt all mentions of it in the core
> specification, as well as (potentially) any additional details
> presented in the version of the proposal document that was approved by
> the UTC as part of encoding the character.

Exactly.

The essential information for LATIN SMALL LETTER P WITH FLOURISH is that
in Medieval manuscripts it is used for "pro" or "por". This information
is available only in

   http://www.unicode.org/L2/L2006/06027-n3027-medieval.pdf

Is this a static and permanent link? What is the copyright status of the
document? For example:

Can it be redistributed and replicated on other sites?  Can it be quoted
literally in a Wikipedia entry?

In general, what can be done to make access to such information easier?

Best regards

Janusz

-- 
                           ,   
Prof. dr hab. Janusz S. Bien -  Uniwersytet Warszawski (Katedra Lingwistyki Formalnej)
Prof. Janusz S. Bien - University of Warsaw (Formal Linguistics Department)
jsbien at uw.edu.pl, jsbien at mimuw.edu.pl, http://fleksem.klf.uw.edu.pl/~jsbien/



More information about the Unicode mailing list