Removing accents and diacritics from a word

Asmus Freytag (c) via Unicode unicode at unicode.org
Wed Jul 17 19:05:58 CDT 2019


On 7/17/2019 11:25 AM, Sławomir Osipiuk wrote:
>
> “Transliteration”?
>
> Maybe more generic that what you’re looking for. Used for the process 
> of producing the “machine readable zone” on passports:
>
> https://www.icao.int/publications/Documents/9303_p3_cons_en.pdf (see 
> section 6, page 30)
>
> “Accent folding” or “diacritic folding” is used in some places. String 
> folding is “A string transform F, with the property that repeated 
> applications of the same function F produce the same output: F(F(S)) = 
> F(S) for all input strings S”. Accent folding is a special case of that.
>
> https://unicode.org/reports/tr23/#StringFunctionClassificationDefinitions
>
> https://alistapart.com/article/accent-folding-for-auto-complete/
>
Diacritic folding. Thanks. Just didn't think of the operation as folding 
the way it came up, but that's what it is.

A./


> *From:*Unicode [mailto:unicode-bounces at unicode.org] *On Behalf Of 
> *Asmus Freytag via Unicode
> *Sent:* Wednesday, July 17, 2019 13:38
> *To:* Unicode Mailing List
> *Subject:* Removing accents and diacritics from a word
>
> A question has come up in another context:
>
> Is there any linguistic term for describing the process of removing 
> accents and diacritics from a word to create its “base form”, e.g. São 
> Tomé to Sao Tome?
>
> The linguistic term "string normalization" appears not that preferable 
> in a computing context.
>
> Any ideas?
>
> A./
>
>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20190717/2dae3aee/attachment.html>


More information about the Unicode mailing list