talnarchives

Une archive numérique francophone des articles de recherche en Traitement Automatique de la Langue.

A comparative study of word embeddings and other features for lexical complexity detection in French

Aina Garí Soler, Marianna Apidianaki, Alexandre Allauzen

Abstract : Lexical complexity detection is an important step for automatic text simplification which serves to make informed lexical substitutions. In this study, we experiment with word embeddings for measuring the complexity of French words and combine them with other features that have been shown to be well-suited for complexity prediction. Our results on a synonym ranking task show that embeddings perform better than other features in isolation, but do not outperform frequency-based systems in this language.

Keywords : Lexical complexity, readability, synonym ranking, word embeddings.