Clinical Information Extraction from Sleep Medicine Interviews using Large Language Models

Veronika Parkhomenko, Julien Coelho, philip pierre, Florian Pecune

Abstract : Automatic extraction of clinical information from physician–patient dialogues remains underexplored, particularly in specialized domains such as sleep medicine. In this work, we propose an approach based on locally deployed large language models (LLMs) to extract and structure clinical information from real clinical interviews. Our corpus consists of 150 interviews collected at the Bordeaux University Hospital, 30 of which were finely annotated by an expert physician according to a schema derived from the diagnostic criteria of the International Classification of Sleep Disorders (ICSD): 47 symptoms associated with a status (present/absent/not mentioned) and seven high‑level clinical entities (chief complaint, medical history, comorbidities, functional impact, etc.). We evaluate nine open‑source LLMs deployed locally. Performance is measured using macro‑F1 for symptom extraction and BERTScore for the clinical free-text fields. The best models reach a macro‑F1 of 0.79 and a BERTScore of 0.76, highlighting the potential of these approaches for the automatic structuring of specialized clinical consultations.

Keywords : clinical information extraction, large language models, sleep disorders

Téléchargement :
[article]
[bibtex]

talnarchives

Une archive numérique francophone des articles de recherche en Traitement Automatique de la Langue.

Clinical Information Extraction from Sleep Medicine Interviews using Large Language Models

Veronika Parkhomenko, Julien Coelho, philip pierre, Florian Pecune