talnarchives

Une archive numérique francophone des articles de recherche en Traitement Automatique de la Langue.

What shall we read : the article or the citations? - A case study on scientific language understanding

Aman Sinha, Sam Bigeard, Marianne Clausel, Mathieu Constant

Abstract : The number of scientific articles is increasing tremendously across all domains to such an extent that it has become hard for researchers to remain up-to-date. Evidently, scientific language understanding systems and Information Extraction (IE) systems, with the advancement of Natural Language Processing (NLP) techniques, are benefiting the needs of users. Although the majority of the practices for building such systems are data-driven, advocating the idea of “The more, the better”. In this work, we revisit the paradigm - questioning what type of data : text (title, abstract) or citations, can have more impact on the performance of scientific language understanding systems.

Keywords : Analyse de documents scientifiques