talnarchives

Une archive numérique francophone des articles de recherche en Traitement Automatique de la Langue.

Comparing Named-Entity Recognizers in a Targeted Domain: Handcrafted Rules vs Machine Learning

Ioannis Partalas, Cédric Lopez, Frédérique Segond

Abstract : Comparing Named-Entity Recognizers in a Targeted Domain : Handcrafted Rules vs. Machine Learning Named-Entity Recognition concerns the classification of textual objects in a predefined set of categories such as persons, organizations, and localizations. While Named-Entity Recognition is well studied since 20 years, the application to specialized domains still poses challenges for current systems. We developed a rule-based system and two machine learning approaches to tackle the same task : recognition of product names, brand names, etc., in the domain of Cosmetics, for French. Our systems can thus be compared under ideal conditions. In this paper, we introduce both systems and we compare them.

Keywords : NER, e-Commerce, système à base de règles, système d’apprentissage.