normalization: dotless i + COMBINING ACUTE ACCENT doesn't combine to I ACUTE

Nico Schlömer nico.schloemer at gmail.com
Tue Jun 14 14:56:36 CDT 2022


Hi everyone,

I was wondering about Unicode normalization with the dotless i/j characters.

In Python (and all other implementations I've checked), i + COMBINING
ACUTE ACCENT combine to LATIN SMALL LETTER I WITH ACUTE
```
from unicodedata import normalize
normalize("NFC", "i\N{COMBINING ACUTE ACCENT}").encode("ascii", "namereplace")
```
```
b'\\N{LATIN SMALL LETTER I WITH ACUTE}'
```
When doing the same with a dotless i, it does _not_ combine:
```
from unicodedata import normalize
normalize("NFC", "\N{LATIN SMALL LETTER DOTLESS I}\N{COMBINING ACUTE
ACCENT}").encode("ascii", "namereplace")
```
```
b'\\N{LATIN SMALL LETTER DOTLESS I}\\N{COMBINING ACUTE ACCENT}'
```
Is this consistent with the standard, and oversight in the standard,
or intended?

Perhaps someone here can shed some light on it. See also this
stackoverflow request [1] and this Python bug report [2].

Cheers,
Nico

[1] https://stackoverflow.com/q/72608183/353337
[2] https://github.com/python/cpython/issues/93767


More information about the Unicode mailing list