=?utf-8?Q?=E2=97=8C_?=in LB28a in UAX14 of Unicode 15.1.0

Daniel Bünzli daniel.buenzli at erratique.ch
Mon Sep 4 08:33:12 CDT 2023


Thanks. 

I think it would be better if that was written \u{255C} as per regexp notation. Like that it’s highly ambiguous as to what it represents since in these rules a class C itself represent \p{lb=C} and some of the characters are distinguished syntax.

Also it would be nicer for certain implementations if that was somehow integrated as a character class in the rules like e.g. ZJW is.

Which leads me to another question, is there a machine readable version of the rules for all the Unicode segmentation standards ? In the ldlm perhaps ?

Best,

D





More information about the Unicode mailing list