Pyrlato: A novel methodology to collect real-world acoustic data

Giuseppe Magistro; Claudia Crocco

doi:10.1344/efe-2023-32-243-254

Authors

Giuseppe Magistro Ghent University https://orcid.org/0000-0002-0272-741X
Claudia Crocco Ghent University https://orcid.org/0000-0003-1099-956X

DOI:

https://doi.org/10.1344/efe-2023-32-243-254

Keywords:

real-word data, ecological vadility, data scraping

Abstract

In this paper, we present Pyrlato, an innovative tool developed in Python for collecting acoustic data from YouTube. The development of this tool was motivated by the need to conveniently collect real-world spoken data. By executing this Python code, researchers can obtain a spoken corpus of specific words, syllables, constituents, and more. We illustrate the main steps of the execution to demonstrate how it works and how to use it. Additionally, we provide a complete example for reference, demonstrating how to customize Pyrlato according to specific requirements. Finally, we discuss the future developments we intend to cover for Pyrlato.

References

Abraham, W. (1991). The grammaticization of the German modal particles. In E. Closs Traugott, & B. Heine (Eds.), Approaches to Grammati-calization: Volume II. Types of grammatical markers (pp. 331–380). John Benjamins. https://doi.org/10.1075/tsl.19.2.17abr

Albano Leoni, F. (2013). Il parlato e la comunica-zione parlata. In G. Iannàccaro (Ed.), Linguisti-ca italiana all'alba del terzo millennio (1997–2010) (pp. 129–148) [SLI 58]. Bulzoni.

Anderson, A. H., Bader, M., Gurman Bard, E., Boyle, E., Doherty, G., Garrod, S., Isard, S., Kowtko, J., McAllister, J., Miller, J., Sotillo, C., Thompson, H., & Weinert, R. (1991). The HCRC Map Task Corpus. Language and Speech, 34(4), 351–366. https://doi.org/10.1177/00238309910340040

Awan, S. N., Shaikh, M. A., Awan, J. A., Abdalla, I., Lim, K. O., & Misono, S. (in press). Smartphone recordings are comparable to “gold standard” recordings for acoustic measurements of voice. Journal of Voice. https://doi.org/10.1016/j.jvoice.2023.01.031

Baltazani, M., Coleman, J., Passoni, E., & Przed-lacka, J. (in press). Echoes of Past Contact: Venetian Influence on Cretan Greek Intona-tion. In B. Gili Fivela B., & C. Avesani (Eds.), Language and Speech [PaPE 2019 Special Is-sue]. https://doi.org/10.1177/00238309221091939

Beckman, M. E. (1997). A typology of spontane-ous speech. In Y. Sagisaka, N. Campbell, & N. Higuchi (Eds.), Computing Prosody: Computa-tional Models for Processing Spontaneous Speech (pp. 7–26). Springer. http://dx.doi.org/10.1007/978-1-4612-2258-3_2

Bertinetto, P., & Loporcaro, M. (2005). The sound pattern of Standard Italian, as compared with the varieties spoken in Florence, Milan and Rome. Journal of the International Phonetic Association, 35(2), 131–151. https://doi.org/10.1017/S0025100305002148

Blum-Kulka, S., House, J., & Kasper, G. (1989). Investigating crosscultural pragmatics: An in-troductory overview. In S. Blum-Kulka, J. House, & G. Kasper (Eds.), Cross-cultural Pragmatics. Requests and Apologies (pp. 1–34). Ablex.

Canepari, L. (1992). Manuale di pronuncia italia-na, con un pronunciario di oltre 30000 voci e due audiocassette. Zanichelli.

Cangemi, F., & Niebuhr, O. (2018). Rethinking reduction and canonical forms. In O. Niebuhr, F. C. Barbara Schuppler, M. Clayards, & M. Zellers (Eds.), Rethinking reduction: Interdisci-plinary perspectives on conditions, mecha-nisms, and domains for phonetic variation (pp. 291–316). De Gruyter. https://doi.org/10.1515/9783110524178-009

Couper-Kuhlen, E. (1986). An introduction to English prosody. Arnold & Niemeyer.

Couper-Kuhlen, E., & Selting, M. (2018). Interac-tional linguistics: Studying language in social interaction. Cambridge University Press. https://doi.org/10.1017/9781139507318

Crocco, C. (2017). Everyone has an accent: Standard Italian and regional pronunciation. (2017) In M. Cerruti, C. Crocco and S. Marzo (Eds.), Towards a new standard: Theoretical and empirical studies on the restandardization of Italian (pp. 89–117), Mouton de Gruyter, https://doi.org/10.1515/9781614518839-004

Egg, M., & Zimmermann, M. (2012). Stressed out! Accented discourse particles: The case of DOCH. In A. Aguilar Guevara, A. Cher-nilovskaya, & R. Nouwen (Eds.), Proceedings of Sinn und Bedeutung 16 (pp. 225–238). MIT Press.

Fuchs, R., & Maxwell, O. (2016). The effects of MP3 compression on acoustic measurements of fundamental frequency and pitch range. In J. Barnes, A. Brugos, S. Shattuck-Hufnagel, & N. Veilleux (Eds.), Proceedings of Speech Prosody 8 (pp. 523–527). International Speech Communication Association. http://doi.org/0.21437/SpeechProsody.2016-107

Karagjosova, E. (2004). The meaning and function of German modal particles [Doctoral disserta-tion, Universität des Saarlandes]. Saarabrücken Dissertations in Computational Linguistics and Language Technology.

Leemann, A., Marie-José, K., & David, B. (2018). The English dialects app: The creation of a crowdsourced dialect corpus. Ampersand, 5, 1–17. https://doi.org/10.1016/j.amper.2017.11.001

Magistro, G. (2021). Speech prosody and remote experiments: A technical report. arXiv, 2106, Article 10915. https://doi.org/10.48550/arXiv.2106.10915

Mairano, P., & de Iacovo, V. (2020) Gemination in Northern versus Central and Southern varie-ties of Italian: A corpus-based investigation. Language and Speech, 63(3), 608–634. https://doi.org/10.1177/0023830919875481

Parsa, V., Jamieson, D., & Pretty, B. R. (2001). Effects of microphone type on acoustic measures of voice. Journal of Voice 15(3), 331–343. https://doi.org/10.1016/S0892-1997(01)00035-2

Payne, E. M. (2005). Phonetic variation in Italian consonant gemination. Journal of the Interna-tional Phonetic Association, 35(2), 153–181. https://doi.org/10.1017/S0025100305002240

Pean, V., Williams, S., & Eskenazi, M. (1993). The design and recording of icy, a corpus for the study of intraspeaker variability and the characterisation of speaking styles. In Proceed-ings of the 3rd European Conference on Speech Communication and Technology (Eurospeech 1993) (pp. 627–630). International Speech Communication Association. https://doi.org/10.21437/Eurospeech.1993-152

Prieto, P., & Roseano, P. (2016). The encoding of epistemic operations in two romance languages: Intonation and pragmatic markers. In J. Barnes, A. Brugos, S. Shattuck-Hufnagel, & N. Veil-leux (Eds.), Proceedings of the 8th Speech Prosody (pp. 888–892). International Speech Communication Association. https://doi.org/10.21437/SpeechProsody.2016-182

Prieto, P., & Roseano, P. (2021). The encoding of epistemic operations in two Romance laguages: The interplay between intonation and discourse markers. Journal of Pragmatics, 172, 146–163. https://doi.org/10.1016/j.pragma.2020.11.008

Rathcke, T., Stuart-Smith, J. Torsney, B., & Har-rington, J. (2017). The beauty in a beast: Mini-mising the effects of diverse recording quality on vowel formant measurements in sociopho-netic real-time studies. Speech Communication, 86, 24–41. https://doi.org/10.1016/j.specom.2016.11.001

Repp, S., & Seeliger, H. (2023) Reject?! On the prosody of non acceptance. In R. Skarnitzl, & J. Volín (Eds.), Proceedings of the 20th Interna-tional Congress of Phonetic Sciences (pp. 1355–1359). Guarant International.

Stefanowitsch, A. (2020). Corpus linguistics: A guide to the methodology. Language Science Press. https://doi.org/10.5281/zenodo.3735822

Tucker, B. V., & Ernestus, M. (2016). Why we need to investigate casual speech to truly un-derstand language production, processing and the mental lexicon. The Mental Lexicon 11(3), 375–400. https://doi.org/10.1075/ml.11.3.03tuc

Tucker, B. V., & Mukai, Y. (2023). Spontaneous speech. Cambridge University Press. http://doi.org/10.1017/9781108943024

Vanrell, M. M., Feldhausen, I., & Astruc, L. (2018). The Discourse Completion Task in Romance prosody research. In I. Fedlhausen, J. Fliessbach, & M. M. Vanrell (Eds.), Methods in prosody: A Romance language perspective (pp. 191–227). Language Science Press. https://doi.org/10.5281/zenodo.1441345

Voghera, M. (2022) From Speaking to Grammar. Peter Lang. https://doi.org/10.3726/b19221

Warner, N. (2012). Methods for studying sponta-neous speech. In A. Cohn, C. Fougeron, & M. Huffman (Eds.), The Oxford Handbook of La-boratory Phonology (pp. 621–633). Oxford University Press.

Pyrlato: A novel methodology to collect real-world acoustic data

Authors

DOI:

Keywords:

Abstract

References

Downloads

Published

How to Cite

Issue

Section

License

Similar Articles

Information

Make a Submission