The CELEN learner corpus and its application to teaching and research in Spanish as a foreign language

Authors

DOI:

https://doi.org/10.1344/teisel.v3.42898

Keywords:

Spanish, foreign language, learner corpus, writing

Abstract

This paper presents the CELEN corpus (https://ske.li/qqr), a collection of texts written by Japanese L1 speakers with different levels of proficiency in Spanish as a foreign language, from level A1 to level C2 of the CEFR. The data comes from (1) universities in Japan, where Spanish can be studied as a foreign language subject or as a major, and (2) contexts of real interaction on the Internet such as electronic blogs and forums. Version 1.2 (April 2023) is composed of 6,196 texts written by 1,035 learners, with a total of 658,467 words. In section 1 we briefly review the situation of Spanish as a foreign language in Japan and the existing learner corpora. In section 2 we describe the main features of the corpus, the data collection and annotation process and the search interface. In section 3 we exemplify various types of searches (concordances, collocations, word lists and n-grams) applied to linguistic phenomena relevant in the teaching and research of Spanish: the use of se, prepositions, gender agreement, word order, verbal collocations, lexical frequency, and pos-tag sequences. This is an open resource, that is updated periodically, and we hope that other teachers and researchers can include their texts in it and offer the scientific community a wide sample of texts from Japanese learners of Spanish. A detailed user guide is available on the project website (https://sites.google.com/view/celen) and parts of the corpus can be downloaded in full under a CC BY-NC 4.0 license.

Downloads

Download data is not yet available.

Author Biography

Pilar Valverde, Universidad Kansai Gaidai

Pilar Valverde has a degree in Hispanic Philology from the University of Barcelona, a master's degree in Cognitive Science and Language from the same university and a European doctorate in Spanish Language, from the University of Santiago de Compostela. She has worked on corpus annotation with syntactic and semantic information at the Language and Computing Center of the University of Barcelona, the Department of Spanish Language of the University of Santiago de Compostela and the Institute of Language and Communication of the University of Denmark from the south. Since 2010 she has been teaching Spanish as a foreign language in Japan and currently she is an associate professor at the Faculty of Foreign Studies at Kansai Gaidai University. Her research interests are linguistic technologies, the teaching/learning of Spanish as a foreign language and the analysis of the written language of Japanese students. She has been a principal investigator on research projects about automatic detection of grammatical errors and creation of learner corpora, funded by the Japan Society for the Promotion of Science.

Published

2023-10-26

Issue

Section

"Articles about Resources and Tools" section