The CELEN learner corpus and its application to teaching and research in Spanish as a foreign language
DOI:
https://doi.org/10.1344/teisel.v3.42898Keywords:
Spanish, foreign language, learner corpus, writingAbstract
This paper presents the CELEN corpus (https://ske.li/qqr), a collection of texts written by Japanese L1 speakers with different levels of proficiency in Spanish as a foreign language, from level A1 to level C2 of the CEFR. The data comes from (1) universities in Japan, where Spanish can be studied as a foreign language subject or as a major, and (2) contexts of real interaction on the Internet such as electronic blogs and forums. Version 1.2 (April 2023) is composed of 6,196 texts written by 1,035 learners, with a total of 658,467 words. In section 1 we briefly review the situation of Spanish as a foreign language in Japan and the existing learner corpora. In section 2 we describe the main features of the corpus, the data collection and annotation process and the search interface. In section 3 we exemplify various types of searches (concordances, collocations, word lists and n-grams) applied to linguistic phenomena relevant in the teaching and research of Spanish: the use of se, prepositions, gender agreement, word order, verbal collocations, lexical frequency, and pos-tag sequences. This is an open resource, that is updated periodically, and we hope that other teachers and researchers can include their texts in it and offer the scientific community a wide sample of texts from Japanese learners of Spanish. A detailed user guide is available on the project website (https://sites.google.com/view/celen) and parts of the corpus can be downloaded in full under a CC BY-NC 4.0 license.
Downloads
Downloads
Published
Issue
Section
License
Copyright (c) 2023 Pilar Valverde

This work is licensed under a Creative Commons Attribution 4.0 International License.