talnarchives

Une archive numérique francophone des articles de recherche en Traitement Automatique de la Langue.

Exploring sentence informativeness

Syrielle Montariol, Aina Garí Soler, Alexandre Allauzen

Abstract : This study is a preliminary exploration of the concept of informativeness –how much information a sentence gives about a word it contains– and its potential benefits to building quality word representations from scarce data. We propose several sentence-level classifiers to predict informativeness, and we perform a manual annotation on a set of sentences. We conclude that these two measures correspond to different notions of informativeness. However, our experiments show that using the classifiers’ predictions to train word embeddings has an impact on embedding quality.

Keywords : Informativeness, Word embeddings, Sentence classification, Data annotation.