Pyrlato: A novel methodology to collect real-world acoustic data
DOI:
https://doi.org/10.1344/efe-2023-32-243-254Keywords:
real-word data, ecological vadility, data scrapingAbstract
In this paper, we present Pyrlato, an innovative tool developed in Python for collecting acoustic data from YouTube. The development of this tool was motivated by the need to conveniently collect real-world spoken data. By executing this Python code, researchers can obtain a spoken corpus of specific words, syllables, constituents, and more. We illustrate the main steps of the execution to demonstrate how it works and how to use it. Additionally, we provide a complete example for reference, demonstrating how to customize Pyrlato according to specific requirements. Finally, we discuss the future developments we intend to cover for Pyrlato.
References
Abraham, W. (1991). The grammaticization of the German modal particles. In E. Closs Traugott, & B. Heine (Eds.), Approaches to Grammati-calization: Volume II. Types of grammatical markers (pp. 331–380). John Benjamins. https://doi.org/10.1075/tsl.19.2.17abr
Albano Leoni, F. (2013). Il parlato e la comunica-zione parlata. In G. Iannàccaro (Ed.), Linguisti-ca italiana all'alba del terzo millennio (1997–2010) (pp. 129–148) [SLI 58]. Bulzoni.
Anderson, A. H., Bader, M., Gurman Bard, E., Boyle, E., Doherty, G., Garrod, S., Isard, S., Kowtko, J., McAllister, J., Miller, J., Sotillo, C., Thompson, H., & Weinert, R. (1991). The HCRC Map Task Corpus. Language and Speech, 34(4), 351–366. https://doi.org/10.1177/00238309910340040
Awan, S. N., Shaikh, M. A., Awan, J. A., Abdalla, I., Lim, K. O., & Misono, S. (in press). Smartphone recordings are comparable to “gold standard” recordings for acoustic measurements of voice. Journal of Voice. https://doi.org/10.1016/j.jvoice.2023.01.031
Baltazani, M., Coleman, J., Passoni, E., & Przed-lacka, J. (in press). Echoes of Past Contact: Venetian Influence on Cretan Greek Intona-tion. In B. Gili Fivela B., & C. Avesani (Eds.), Language and Speech [PaPE 2019 Special Is-sue]. https://doi.org/10.1177/00238309221091939
Beckman, M. E. (1997). A typology of spontane-ous speech. In Y. Sagisaka, N. Campbell, & N. Higuchi (Eds.), Computing Prosody: Computa-tional Models for Processing Spontaneous Speech (pp. 7–26). Springer. http://dx.doi.org/10.1007/978-1-4612-2258-3_2
Bertinetto, P., & Loporcaro, M. (2005). The sound pattern of Standard Italian, as compared with the varieties spoken in Florence, Milan and Rome. Journal of the International Phonetic Association, 35(2), 131–151. https://doi.org/10.1017/S0025100305002148
Blum-Kulka, S., House, J., & Kasper, G. (1989). Investigating crosscultural pragmatics: An in-troductory overview. In S. Blum-Kulka, J. House, & G. Kasper (Eds.), Cross-cultural Pragmatics. Requests and Apologies (pp. 1–34). Ablex.
Canepari, L. (1992). Manuale di pronuncia italia-na, con un pronunciario di oltre 30000 voci e due audiocassette. Zanichelli.
Cangemi, F., & Niebuhr, O. (2018). Rethinking reduction and canonical forms. In O. Niebuhr, F. C. Barbara Schuppler, M. Clayards, & M. Zellers (Eds.), Rethinking reduction: Interdisci-plinary perspectives on conditions, mecha-nisms, and domains for phonetic variation (pp. 291–316). De Gruyter. https://doi.org/10.1515/9783110524178-009
Couper-Kuhlen, E. (1986). An introduction to English prosody. Arnold & Niemeyer.
Couper-Kuhlen, E., & Selting, M. (2018). Interac-tional linguistics: Studying language in social interaction. Cambridge University Press. https://doi.org/10.1017/9781139507318
Crocco, C. (2017). Everyone has an accent: Standard Italian and regional pronunciation. (2017) In M. Cerruti, C. Crocco and S. Marzo (Eds.), Towards a new standard: Theoretical and empirical studies on the restandardization of Italian (pp. 89–117), Mouton de Gruyter, https://doi.org/10.1515/9781614518839-004
Egg, M., & Zimmermann, M. (2012). Stressed out! Accented discourse particles: The case of DOCH. In A. Aguilar Guevara, A. Cher-nilovskaya, & R. Nouwen (Eds.), Proceedings of Sinn und Bedeutung 16 (pp. 225–238). MIT Press.
Fuchs, R., & Maxwell, O. (2016). The effects of MP3 compression on acoustic measurements of fundamental frequency and pitch range. In J. Barnes, A. Brugos, S. Shattuck-Hufnagel, & N. Veilleux (Eds.), Proceedings of Speech Prosody 8 (pp. 523–527). International Speech Communication Association. http://doi.org/0.21437/SpeechProsody.2016-107
Karagjosova, E. (2004). The meaning and function of German modal particles [Doctoral disserta-tion, Universität des Saarlandes]. Saarabrücken Dissertations in Computational Linguistics and Language Technology.
Leemann, A., Marie-José, K., & David, B. (2018). The English dialects app: The creation of a crowdsourced dialect corpus. Ampersand, 5, 1–17. https://doi.org/10.1016/j.amper.2017.11.001
Magistro, G. (2021). Speech prosody and remote experiments: A technical report. arXiv, 2106, Article 10915. https://doi.org/10.48550/arXiv.2106.10915
Mairano, P., & de Iacovo, V. (2020) Gemination in Northern versus Central and Southern varie-ties of Italian: A corpus-based investigation. Language and Speech, 63(3), 608–634. https://doi.org/10.1177/0023830919875481
Parsa, V., Jamieson, D., & Pretty, B. R. (2001). Effects of microphone type on acoustic measures of voice. Journal of Voice 15(3), 331–343. https://doi.org/10.1016/S0892-1997(01)00035-2
Payne, E. M. (2005). Phonetic variation in Italian consonant gemination. Journal of the Interna-tional Phonetic Association, 35(2), 153–181. https://doi.org/10.1017/S0025100305002240
Pean, V., Williams, S., & Eskenazi, M. (1993). The design and recording of icy, a corpus for the study of intraspeaker variability and the characterisation of speaking styles. In Proceed-ings of the 3rd European Conference on Speech Communication and Technology (Eurospeech 1993) (pp. 627–630). International Speech Communication Association. https://doi.org/10.21437/Eurospeech.1993-152
Prieto, P., & Roseano, P. (2016). The encoding of epistemic operations in two romance languages: Intonation and pragmatic markers. In J. Barnes, A. Brugos, S. Shattuck-Hufnagel, & N. Veil-leux (Eds.), Proceedings of the 8th Speech Prosody (pp. 888–892). International Speech Communication Association. https://doi.org/10.21437/SpeechProsody.2016-182
Prieto, P., & Roseano, P. (2021). The encoding of epistemic operations in two Romance laguages: The interplay between intonation and discourse markers. Journal of Pragmatics, 172, 146–163. https://doi.org/10.1016/j.pragma.2020.11.008
Rathcke, T., Stuart-Smith, J. Torsney, B., & Har-rington, J. (2017). The beauty in a beast: Mini-mising the effects of diverse recording quality on vowel formant measurements in sociopho-netic real-time studies. Speech Communication, 86, 24–41. https://doi.org/10.1016/j.specom.2016.11.001
Repp, S., & Seeliger, H. (2023) Reject?! On the prosody of non acceptance. In R. Skarnitzl, & J. Volín (Eds.), Proceedings of the 20th Interna-tional Congress of Phonetic Sciences (pp. 1355–1359). Guarant International.
Stefanowitsch, A. (2020). Corpus linguistics: A guide to the methodology. Language Science Press. https://doi.org/10.5281/zenodo.3735822
Tucker, B. V., & Ernestus, M. (2016). Why we need to investigate casual speech to truly un-derstand language production, processing and the mental lexicon. The Mental Lexicon 11(3), 375–400. https://doi.org/10.1075/ml.11.3.03tuc
Tucker, B. V., & Mukai, Y. (2023). Spontaneous speech. Cambridge University Press. http://doi.org/10.1017/9781108943024
Vanrell, M. M., Feldhausen, I., & Astruc, L. (2018). The Discourse Completion Task in Romance prosody research. In I. Fedlhausen, J. Fliessbach, & M. M. Vanrell (Eds.), Methods in prosody: A Romance language perspective (pp. 191–227). Language Science Press. https://doi.org/10.5281/zenodo.1441345
Voghera, M. (2022) From Speaking to Grammar. Peter Lang. https://doi.org/10.3726/b19221
Warner, N. (2012). Methods for studying sponta-neous speech. In A. Cohn, C. Fougeron, & M. Huffman (Eds.), The Oxford Handbook of La-boratory Phonology (pp. 621–633). Oxford University Press.
Downloads
Published
How to Cite
Issue
Section
License
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
All articles published online by Estudios de Fonética Experimental are licensed under Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International (CC BY-NC-ND 4.0 DEED), unless otherwise noted. Estudios de Fonética Experimental is an open access journal. Estudios de Fonética Experimental is hosted by RCUB (Revistes Científiques de la Universitat de Barcelona), powered by Open Journal Systems (OJS) software. The copyright is not transferred to the journal: authors hold the copyright and publishing rights without restrictions. The author is free to use and distribute pre and post-prints versions of his/her article. However, preprint versions are regarded as a work-in-progress version used as internal communication with the authors, and we prefer to share postprint versions.