talnarchives

Une archive numérique francophone des articles de recherche en Traitement Automatique de la Langue.

Types of Semantic Information Necessary in a Machine Translation Lexicon

David Mowatt

Abstract : This paper describes research undertaken into assessing what types of semantic information (SI) are needed in a Machine Translation (MT) lexicon in order for ‘good’ translation quality to be attainable. We present a typology of semantic information, allowing the use of semantics in any MT system to be quantified in precise and absolute, rather than relative, terms. This typology was used to survey the SI present in twenty commercial and research MT systems. An automatically translated corpus was analysed to identify which types of semantics were necessary to achieve high quality translation. The survey and the analysis allowed us to conclude that four of the nine types of SI identified should always be included and that a further two complex SI types should be considered for inclusion pending further analysis. A formal lexicon specification incorporating these six SI types is presented.