talnarchives

Une archive numérique francophone des articles de recherche en Traitement Automatique de la Langue.

Genre classification using Balanced Winnow in the DEFT 2014 challenge

Eva D’hondt

Abstract : In this report we present the work done on the first subtask of the DEFT 2014 challenge which dealt with genre classification of French literary texts. In our approach we developed three types of features : lemmatized words, stylometric features and features that incorporate some form of world knowledge. Subsequent classification experiments were performed using the Balanced Winnow classifier. We submitted three different runs of which the best-scoring one combined all features.

Keywords : text classification, DEFT, literary genre.