Embeddings, topic models, LLM : un air de famille
Ludovic Tanguy, Cécile Fabre, Nabil Hathout, Lydia-Mai Ho-Dac
Abstract : Word embeddings, topic models, LLMs: a family affair This article presents a study on terms denoting family relationships (brother, aunt, etc.) in French using three approaches: word embeddings, topic modeling, and pre-trained language models. The first two types of representations are built from the French version of Wikipedia, while the third is derived through direct interaction with ChatGPT. The aim is to compare how these three methods represent such terms, in two main ways: by evaluating them against a structural definition of family relations (in terms of features such as gender, lineage, etc.), and by comparing the topics associated with each term. These methods reveal different modes of structuring family-related vocabulary, while also underscoring the continued necessity of corpus-based and controlled analyses to obtain reliable results.
Keywords : Word embeddings, topic modeling, LLMs, family lexicon.