VERTa: a machine translation evaluation metric

Applications to L2 research on Spanish and English

Authors

DOI:

https://doi.org/10.1344/teisel.v2.39307

Keywords:

automatic translation, computational linguistics, evaluation, English, Spanish

Abstract

Abstract: This article presents VERTa (https://github.com/jatserias/VERTa for the full version and http://grial.ub.edu:8080/VERTaDemo/ for the Spanish online demo), a machine translation (MT) evaluation metric for English and Spanish. VERTa uses linguistic information to evaluate machine-translated sentences by comparing them with sentences translated by human translators. Unlike other metrics, VERTa provides not only a score for each sentence compared, but also a more qualitative analysis of the results obtained. This article discusses the steps carried out before designing and implementing the metric: the linguistic study of the development corpus to find the most relevant linguistic features that the metric should be able to cover, and the text processing tools to be applied to the compared segments. In addition, it details the modules included in the metric and the information they provide, together with examples of the information the user receives. Although VERTa is an MT evaluation metric, it differs from the rest in that during its development special emphasis was placed on analyzing the linguistic information it should provide to the user, thus going beyond a mere scoring of the translated segment and serving as a first qualitative guide to detect machine translation errors. Consequently, VERTa can be used for the learning, teaching and evaluation of English and Spanish as second and/or foreign languages, as well as to carry out research studies in this area.

Downloads

Download data is not yet available.

Author Biography

Elisabet Comelles, Universitat de Barcelona

Elisabet Comelles is a lecturer at the Department of Modern Languages and Literatures and of English Studies at the University of Barcelona. She holds a PhD in Cognitive Science and Language (University of Barcelona). She’s a member of the GRIAL - Linguistic Applications Inter-University Research Group and collaborates with the GRELIC -Lexicology and Corpus Linguistics Research Group.

Dr. Elisabet Comelles specialises in computational linguistics and natural language processing, as well as in the use of corpus linguistics and language technologies in the learning and teaching of English. She has published several studies on the use of corpus linguistics and language technologies in the linguistics classroom (Laso, Comelles & Verdaguer, 2019; Laso, Comelles, Celaya & Forcadell, 2016; Comelles, Laso, Forcadell, Castaño, Feijóo & Verdaguer, 2013) and on machine translation evaluation (Comelles 2022; Comelles & Atserias, 2019; Comelles, Arranz & Castellón, 2017).

Published

2022-09-20

Issue

Section

"Articles about Resources and Tools" section