talnarchives

Une archive numérique francophone des articles de recherche en Traitement Automatique de la Langue.

Investigating associative, switchable and negatable Winograd items on renewed French data sets

Xiaoou Wang, Olga Seminck, Pascal Amsili

Abstract : The Winograd Schema Challenge (WSC) consists of a set of anaphora resolution problems resolvable only by reasoning about world knowledge. This article describes the update of the existing French data set and the creation of three subsets allowing for a more robust, fine-grained evaluation protocol of WSC in French (FWSC) : an associative subset (items easily resolvable with lexical co-occurrence), a switchable subset (items where the inversion of two keywords reverses the answer) and a negatable subset (items where applying negation on its verb reverses the answer). Experiences on these data sets with CamemBERT reach SOTA performances. Our evaluation protocol showed in addition that the higher performance could be explained by the existence of associative items in FWSC. Besides, increasing the size of training corpus improves the model’s performance on switchable items while the impact of larger training corpus remains small on negatable items.

Keywords : Winograd Schema Challenge, world knowledge, commonsense reasoning, negation, French, CamemBERT.