Transform resolution and before context matches

Kip Cole kipcole9 at gmail.com
Mon Mar 29 08:57:34 CDT 2021


I’m now implementing CLDR transforms and would appreciate some understanding of the following two items:

1. Resolving the correct transform from “Any-Latin”. For example, “de-Latin” has a transform rule “Any-Latin” but such a transform doesn’t exist in the repo. So I presume an appropriate transform has to be resolved. Reading the inheritance rules isn’t helping me. So using this example, how does one resolve the correct transform for “Any-Latin”.

2. I’m not sure how to interpret the Unicode regular expression "[[:Z:][:Ps:][:Pi:]$]” when its in a “before context” as it is in “Any-Publishing.xml” Specifically, where does the “$” anchor? 

  (a) Does “$” in this case mean matching the character just before the insertion point? Or does it mean maches an end-of-line at the insertion point? Or something else?

  (b) For the majority of “before context” matches, which don’t have any anchors in them (“$” or “^”) is the intent that the match aligns to the text immediately before the insertion point (ie with an implied “$” ending at the insertion point). Or is it intended to match anywhere in the prior context from the begging of the string (that would seem strange but TR35 doesn’t seem to explain the correct interpretation and TR18 is silent on the topic). 

As always, thanks for the insight and assistance,

—Kip




More information about the CLDR-Users mailing list