Character Index (was: Re: NamesList.txt as data source)

Ken Whistler kenwhistler at att.net
Tue Mar 29 13:24:14 CDT 2016


On 3/29/2016 12:16 AM, Janusz S. Bień wrote:
> What about a simpler and more technical approach, like a character index
> with links to the relevant proposals? Doesn't such a thing already exist
> for internal use?

No, and it is exceedingly *non*-trivial to produce such an index.
There are now thousands of documents, extending over 27 years
of history (and actually more when you go back to earlier work
on 10646). Much of the early half of that document trail is
paper only, in material that most of the participants have long ago
mulched.

The status of what a "character" even is can change during the
development of proposals, as they morph over time. This is
also exceedingly non-trivial in some cases, where argumentation
about cases of unification and/or disunification of different
source attestations might proceed over an extended period.
That makes it pretty difficult to just willy-nilly produce a
magical character index that points to exactly the right place.

In recent years we have had some individuals who have tracked the
specific documents associated with repertoire new to particular
releases much more thoroughly than in prior years -- but truth
to tell, the *majority* of people involved in maintenance of
the Unicode Standard and ISO/IEC 10646 care little about the
details of that history. Instead, they are basically focused on
whatever happens to be the next thing to argue about. It is
all about shinies -- not about piecing together dusty old artifacts. ;-)

--Ken



More information about the Unicode mailing list