Looking for a standard on historical countries

Steven R. Loomis srl at icu-project.org
Sat Nov 1 23:57:06 CDT 2014

On 11/1/2014 8:10 AM, Richard Wordingham wrote:
> On Fri, 31 Oct 2014 20:43:19 +0100
> Philippe Verdy <verdy_p at wanadoo.fr> wrote:
>> How is ths related to Unicode ?
> One possibility is though the Regional Indicators, but they are defined
> by the unstable ISO 3166-1 alpha-2 codes.   
It was noted as "off topic". It.s relevant because CLDR is relevant.
>> May be it's associated to CLDR for former regional classifcation of
>> languages, but I doubt this will ever create any standardization for
>> historic data that should remain as is without changes in their old
>> sources for which there are no more any active maintainers, just
>> interested people (basically historians that may comment about them
>> the way they want or could invent their new terminology for analysts
>> and archivists).
> A lot of useful historic information is missing from CLDR.  For example,
> I believe line-breaking and word-boundary rules are completely missing
> for 'Sumero-Akkadian' Cuneiform writing systems.  The rules were not
> uniform.  On the other hand, an entry for the Assyrian for 'English' as
> used in the Assyrian homeland would be meaningless. 
A lot of speculation happened some time back with the assumption that
CLDR would a priori reject historic language contributions such as Latin
(it wouldn't). Zero bugs were even filed, let alone any data submitted
for Latin. Besides Sumero-Akkadian, we could probably add break rules
for, say, Oromo, Slovak, Spanish, and Dutch (
http://unicode.org/cldr/trac/ticket/2992 ).

> The precise territory covered by a country is not useful within the
> Unicode domains, nor are debates about independence, nor whether tribute
> was paid regularly.  In general, a more useful division may be by date,
> but that is barely covered by a system designed for present-day
> languages.
Sure. It would need to be a differnet namespace from ISO-3166 and
probably IETF BCP 47.

I wonder if you could use Linked Open Data sets (come hear about it
Monday at IUC38!) to look for ontology/Country that doesn't have a 3166
code, something like the following. You could extract start/end date,
successor country, etc.


> If this thread is of to be of any immediate use, what is the intended
> use of the information?
The original post made it sound like it was related to book publishing.
"all countries where there was a printing press would be optimal coverage".


IBMer but all opinions are mine.      // GPG: 9731166CD8E23A83BEE7C6D3ACA5DBE1FD8FABF1
https://www.ohloh.net/accounts/srl295 // https://ssl.icu-project.org/trac/wiki/Srl 

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 473 bytes
Desc: OpenPGP digital signature
URL: <http://unicode.org/pipermail/unicode/attachments/20141101/d9f7d387/attachment.asc>

More information about the Unicode mailing list