Removing accents and diacritics from a word
Asmus Freytag (c) via Unicode
unicode at unicode.org
Wed Jul 17 19:05:58 CDT 2019
On 7/17/2019 11:25 AM, Sławomir Osipiuk wrote:
>
> “Transliteration”?
>
> Maybe more generic that what you’re looking for. Used for the process
> of producing the “machine readable zone” on passports:
>
> https://www.icao.int/publications/Documents/9303_p3_cons_en.pdf (see
> section 6, page 30)
>
> “Accent folding” or “diacritic folding” is used in some places. String
> folding is “A string transform F, with the property that repeated
> applications of the same function F produce the same output: F(F(S)) =
> F(S) for all input strings S”. Accent folding is a special case of that.
>
> https://unicode.org/reports/tr23/#StringFunctionClassificationDefinitions
>
> https://alistapart.com/article/accent-folding-for-auto-complete/
>
Diacritic folding. Thanks. Just didn't think of the operation as folding
the way it came up, but that's what it is.
A./
> *From:*Unicode [mailto:unicode-bounces at unicode.org] *On Behalf Of
> *Asmus Freytag via Unicode
> *Sent:* Wednesday, July 17, 2019 13:38
> *To:* Unicode Mailing List
> *Subject:* Removing accents and diacritics from a word
>
> A question has come up in another context:
>
> Is there any linguistic term for describing the process of removing
> accents and diacritics from a word to create its “base form”, e.g. São
> Tomé to Sao Tome?
>
> The linguistic term "string normalization" appears not that preferable
> in a computing context.
>
> Any ideas?
>
> A./
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20190717/2dae3aee/attachment.html>
More information about the Unicode
mailing list