Stop words for CLDR

Marius Spix via Unicode unicode at unicode.org
Thu Jan 23 12:32:56 CST 2020


I wonder if there is any interest in adding stop words to CLDR? Stop
words are ignored by natural language processing algorithms, with use
cases like search engines, word clouds and text classification.

There are already existing collections with stop words like [1] or [2]
which could be used, but I believe that Unicode CLDR would be the best
place for such lists.

Regards,

Marius Spix

[1] https://pypi.org/project/stop-words/
[2]
https://raw.githubusercontent.com/nltk/nltk_data/gh-pages/packages/corpora/stopwords.zip


More information about the Unicode mailing list