talnarchives

Une archive numérique francophone des articles de recherche en Traitement Automatique de la Langue.

Emerging Content Management Technologies

Udo Hahn

Abstract : Recent advances in language engineering have led to diverse techniques by which content items in written documents can be tracked and managed. Some of these techniques elaborate on fairly standard information retrieval methodologies (tf-idf, vector space model, etc.), though different applications are envisaged (e.g., filtering, notification or recommender systems). Current efforts also target on systems which provide automatic summarization based on the extraction of sentences (or phrases). Finally, information extraction is concerned with methodologies to extract relevant data from textual sources. In this talk, some of the core techniques for content tracking and management are identified, evaluation results are presented, and open challenges are discussed.