Combining Characters

Alex Shpilkin ashpilkin at gmail.com
Fri Dec 19 15:17:25 CST 2025


On Fri, Dec 19 2025 at 23:02:55 +02:00:00, Alex Shpilkin 
<ashpilkin at gmail.com> wrote:
> I haven’t gotten to implementing canonical composition yet

And you can tell because the algorithm I’ve posted is wrong. 
Attempted correction (which does introduce a bit of special handling to 
account for the starter+starter case):

starter = 0  # sentinel not part of any compositions
starter index = uninitialized

index = 0
while index < length of string:
   composition = try to compose (starter, string[index])
   if succeeded and (ccc[string[index]] != 0 or index == starter index 
+ 1):
       string[starter index] = composition
       delete string[index]
   else:
       if ccc[string[index]] == 0:  # NB only this late
           starter = string[index]
           starter index = index
       index = index + 1

-- 
Sorry for the noise,
Alex





More information about the Unicode mailing list