Interpreting t-h0- mechanism

Mark Davis ☕️ via CLDR-Users cldr-users at unicode.org
Sun Oct 20 12:00:42 CDT 2019


On Sun, Oct 20, 2019 at 6:26 PM Matthew Stuckwisch <mateu at softastur.org>
wrote:

> >> 1. What is the difference between 'en-t-es-h0-hybrid' and
> 'en-t-h0-es'?  Both styles are given as example encodings for what would be
> Spanglish (Spanish-English hybrid), but surely there ought to be a
> difference
> >>
> > The latter style is illegal. The key h0 only currently takes one
> possible value, h0-hybrid.
> >
> > https://unicode.org/reports/tr35/tr35.html#Hybrid_Locale
> >
> > If you are seeing "h0-es" someplace in the spec, please let us know,
> since that would be a typo.
>
> Indeed, in the documentation it says "Thus Hinglish should be represented
> as hi-t-h0-en where Hindi is the scaffold, and as en-t-h0-hi where English
> is", but the table represents the two Hinglishes as hi-t-en-h0-hybrid or
> en-t-hi-h0-hybrid, which is the source of the initial confusion.
>

Great, thanks for finding these (very misleading) typos.

>
> When it said, "Should there ever be strong need for hybrids of more than
> two languages or for other purposes such as hybrid languages as the source
> of translated content, additional structure could be added." it was not
> clear to me that this meant "it is currently not possible to do this" over
> "by adding in additional structure [e.g. -h0- tags]".
>

Right, that could be clearer; that the current structure does not permit it.

>
> I work occasionally with documents in Eonaviego which would best be coded
> as ast-t-gl-h0-hybrid, but then when translated to-from (which there are
> quite a few to/from Asturian or Spanish), there would be no valid encoding,
> so being able to represent a hybrid language as a source/destination of a
> transform is not a pure hypothetical for me.
>

The hybrids were originally designed for cases like Hinglish or Denglish,
where there are large numbers of borrowings of words from a different
language. Eonaviego sounds like set of dialects on the continuum
between Asturian and Galician. That is, it doesn't appear to be Asturian
with a batch of loan words from Galician.

While "h0-hybrid" is currently the closest term for it, it might be better
to define a new term for that that more precisely identifies "a set of
dialects on the continuum between X and Y".

That being said, the structure isn't designed to allow transforms to or
from these h0 entities; we'd have to think of how it could be extended for
that.


> I will go ahead and file some tickets about the docs/support, thanks
>

Great, glad you found these issues.

>
> Matéu
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/cldr-users/attachments/20191020/d5f825cc/attachment.html>


More information about the CLDR-Users mailing list