Transform resolution and before context matches

Mark Davis ☕️ mark at macchiato.com
Mon Mar 29 09:27:45 CDT 2021


Thanks for your message. There is more information in
https://unicode-org.github.io/icu/userguide/transforms/general/ that should
be incorporated into the LDML section. As to your particular points.

I have some answers below, but I can follow up with details of the edge
cases when I have more time.

Mark


On Mon, Mar 29, 2021 at 6:58 AM Kip Cole via CLDR-Users <
cldr-users at unicode.org> wrote:

> I’m now implementing CLDR transforms and would appreciate some
> understanding of the following two items:
>
> 1. Resolving the correct transform from “Any-Latin”. For example,
> “de-Latin” has a transform rule “Any-Latin” but such a transform doesn’t
> exist in the repo. So I presume an appropriate transform has to be
> resolved. Reading the inheritance rules isn’t helping me. So using this
> example, how does one resolve the correct transform for “Any-Latin”.
>

There are special inheritance rules for Transforms with locales.

   - Any is a special identifier that breaks text by script run, and within
   that script run is replaced by the script of the run.
   - The fallback if there is not a language is language => script. The
   fallback is a 'ladder' between the source and target
   -


> 2. I’m not sure how to interpret the Unicode regular expression
> "[[:Z:][:Ps:][:Pi:]$]” when its in a “before context” as it is in
> “Any-Publishing.xml” Specifically, where does the “$” anchor?
>
>   (a) Does “$” in this case mean matching the character just before the
> insertion point? Or does it mean maches an end-of-line at the insertion
> point? Or something else?
>

It means "off the end of the string". So it is like ^ or $ in regular
expressions.

>
>   (b) For the majority of “before context” matches, which don’t have any
> anchors in them (“$” or “^”) is the intent that the match aligns to the
> text immediately before the insertion point (ie with an implied “$” ending
> at the insertion point). Or is it intended to match anywhere in the prior
> context from the begging of the string (that would seem strange but TR35
> doesn’t seem to explain the correct interpretation and TR18 is silent on
> the topic).


It is immediately before.

>
>
> As always, thanks for the insight and assistance,
>
> —Kip
>
>
> _______________________________________________
> CLDR-Users mailing list
> CLDR-Users at corp.unicode.org
> https://corp.unicode.org/mailman/listinfo/cldr-users
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://corp.unicode.org/pipermail/cldr-users/attachments/20210329/546597bc/attachment.htm>


More information about the CLDR-Users mailing list