normalization: dotless i + COMBINING ACUTE ACCENT doesn't combine to I ACUTE

Richard Wordingham richard.wordingham at ntlworld.com
Tue Jun 14 16:39:22 CDT 2022


On Tue, 14 Jun 2022 21:56:36 +0200
Nico Schlömer via Unicode <unicode at corp.unicode.org> wrote:

> Hi everyone,
> 
> I was wondering about Unicode normalization with the dotless i/j
> characters.
> 
> In Python (and all other implementations I've checked), i + COMBINING
> ACUTE ACCENT combine to LATIN SMALL LETTER I WITH ACUTE
> ```
> from unicodedata import normalize
> normalize("NFC", "i\N{COMBINING ACUTE ACCENT}").encode("ascii",
> "namereplace") ```
> ```
> b'\\N{LATIN SMALL LETTER I WITH ACUTE}'
> ```
> When doing the same with a dotless i, it does _not_ combine:
> ```
> from unicodedata import normalize
> normalize("NFC", "\N{LATIN SMALL LETTER DOTLESS I}\N{COMBINING ACUTE
> ACCENT}").encode("ascii", "namereplace")
> ```
> ```
> b'\\N{LATIN SMALL LETTER DOTLESS I}\\N{COMBINING ACUTE ACCENT}'

> Is this consistent with the standard, and oversight in the standard,
> or intended?

As intended.  The two sequences should render differently in a
Lithuanian locale.

Richard.



More information about the Unicode mailing list