The Unicode Standard and ISO

Marcel Schneider via Unicode unicode at
Sat Jun 9 17:41:19 CDT 2018

On Sat, 9 Jun 2018 12:56:28 -0700, Asmus Freytag via Unicode wrote:
> On 6/9/2018 12:01 PM, Marcel Schneider via Unicode wrote:
> > Still a computer should be understandable off-line, so CLDR providing a standard library of error messages could be 
> > appreciated by the industry.
> The kind of translations that CLDR accumulates, like day, and month names, language and territory names, are a widely
> applicable subset and one that is commonly required in machine generated or machine-assembled text (like displaying
> the date, providing pick lists for configuration of locale settings, etc).
> The universe of possible error messages is a completely different beast.
> If you tried to standardize all error messages even in one language you would never arrive at something that would be
> universally useful. While some simple applications may find that all their needs for communicating with their users are
> covered, most would wish they had some other messages available.

Indeed, error messages althouth technical are like the world’s books, a never-ending production of content. To account for 
this infinity, I was not proposing a closed set of messages to replace application libraries able to display message #123.
In fact I wrote first: “If to date, automatic [automated] translation of technical English still does not work, then I’d suggest 
that CLDR feature a complete message library allowing to compose any localized piece of information.”
Here the piece of information displayed by the application is like a Lego spacecraft, the CLDR messages like Lego bricks.
I didn’t play with Lego since a very long time, but as a boy I learned how it works. I even remember that when building 
a construct, it often happened that some bricks were “missing”. A Lego box is complete wrt one or several models, but 
once my mom showing me the boxes on the shelves explained that they’re composed in a way that you’ll always lack 
something [when trying to build further]. — That doesn’t prevent Lego from thriving, nor many people from enjoying.

> To adopt your scheme, they would need to have a bifurcated approach, where some messages follow the standard,
> while others do not (cannot). At that point, why bother? Determining whether some message can be rewritten to follow
> the standard adds another level of complexity while you'd need to have translation resources for all the non-standard ones anyway.

When CLDR libraries will allow to generate 98 % well-translated info boxes, human translators may focus on the remaining 
2 %. If for any reason they cannot, yet the vendor will get much less support requests than with the ill-translated messages.
> A middle ground is a shared terminology database that allows translators working on different products to arrive at the same translation
> for the same things. Translators already know how to use such databases in their work flow, and integrating a shared one with
> a product-specific one is much easier than trying to deal with a set of random error messages.

If the scheme you outline works well, where come the reported oddities from? Obviously terminology is not all, it’s like Lego bricks without studs:
Terms alone don’t interlock and therefore the user cannot make sense. This is where CLDR’s hopefully on-coming localizable message bricks enter 
in action, helping automated translation software compose understandable output, using patterns. Google translate is unable to do that, as shown 
in the English and French translations of this sentence found in a page of the Finnish NB:

Finnish: Kielitoimiston ohjeen mukaan esimerkiksi vieraskielisissä nimissä on pyrittävä säilyttämään kaikki tarkkeet.
Google English: According to the Language Office, for example, in the name of a foreign language, it is necessary to maintain all the checkpoints.
Google French: Selon le Language Office, par exemple, au nom d'une langue étrangère, il est nécessaire de maintenir tous les points de contrôle.

> It's pushing this kind of impractical scheme that gives standardizers a bad name. 
> Especially if it is immediately tied to governmental procurement, forcing people to adopt it (or live with it) whether it provides any actual benefit.

These statements make much sense to me…

> However, a high-quality terminology database recommends itself (and doesn't need any procurement standards).
> Ultimately, it was its demonstrated usefulness that drove the adoption of CLDR.

This is why I’m so hopeful that CLDR will go much farther than date and time and other locale settings, and emoji names and keywords.

Best regards,


More information about the Unicode mailing list