Removing accents and diacritics from a word

Sławomir Osipiuk via Unicode unicode at unicode.org
Wed Jul 17 13:25:02 CDT 2019


“Transliteration”?

Maybe more generic that what you’re looking for. Used for the process of producing the “machine readable zone” on passports:

https://www.icao.int/publications/Documents/9303_p3_cons_en.pdf (see section 6, page 30)

 

“Accent folding” or “diacritic folding” is used in some places. String folding is “A string transform F, with the property that repeated applications of the same function F produce the same output: F(F(S)) = F(S) for all input strings S”. Accent folding is a special case of that.

https://unicode.org/reports/tr23/#StringFunctionClassificationDefinitions

https://alistapart.com/article/accent-folding-for-auto-complete/

 

 

From: Unicode [mailto:unicode-bounces at unicode.org] On Behalf Of Asmus Freytag via Unicode
Sent: Wednesday, July 17, 2019 13:38
To: Unicode Mailing List
Subject: Removing accents and diacritics from a word

 

A question has come up in another context:

Is there any linguistic term for describing the process of removing accents and diacritics from a word to create its “base form”, e.g. São Tomé to Sao Tome?

The linguistic term "string normalization" appears not that preferable in a computing context.

Any ideas?

A./







-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20190717/a0cb5fec/attachment.html>


More information about the Unicode mailing list