AI is great, isn’t it? Tone direction and illocutionary force delivery of tag questions in Amazon’s AI NTTS Polly
DOI:
https://doi.org/10.1344/efe-2023-32-227-242Keywords:
illocutionary force, tag questions, intonation, text-to-speech, artificial intelligence (AI)Abstract
This work provides a descriptive analysis of the tone direction and its inherent illocutionary force in question tags delivered by Amazon’s neural text-to-speech system Polly. We included three types of tag questions (reverse-polarity tags — both positive and negative —, copy tags and command tags) for which 10 sentences were used as input in each case. The data included 600 utterances produced by British and American English voices currently available on Amazon’s NTTS. The audio files were examined with the speech analysis software Praat to identify the tone pattern for each utterance and confirm the intended illocutionary force. The results show that Amazon’s AI speech synthesis technology is not yet fully reliable and produces a high rate of utterances whose pragmatic load is undesired when using natural spontaneous speech traits as question tags.
References
Boersma, P., & Weenink, D. (1992–2023). Praat: Doing phonetics by computer (Version 6.3.18) [Computer program]. http://www.praat.org/
Cattel, R. (1973). Negative transportation and tag questions. Language, 49, 612–639. https://doi.org/10.2307/412354
Cohen, M. H., Giangola, J. P. & Balogh, J. (2004). Voice user interface design. Addison-Wesley Professional.
Collins, B., & Mees, I. M. (2013). Practical pho-netics and phonology. A Resource Book for Students. Routledge. https://doi.org/10.4324/9780203080023
Cruttenden, A. (2014). Gimson’s Pronunciation of English. Routledge. https://doi.org/10.4324/9780203784969
Estebas Vilaplana, E. (2014). Teach yourself Eng-lish pronunciation: An interactive course for Spanish speakers. Universidad Nacional de Educación a Distancia.
Gómez González, M. A., & Sánchez Roura, M. T. (2016). English pronunciation for speakers of Spanish: from theory to practice. Walter de Gruyter. https://doi.org/10.1515/9781501510977
Kay, P. (2006). Pragmatic aspects of grammatical constructions. In L. R. Horn, & G. Ward (Eds.), The Handbook of Pragmatics. (pp. 675–700). Blackwell Publishing.
Kim, H., Kim, S., & Yoon, S. (2022). Guided-TTS: A diffusion model for text-to-speech via classifier guidance. Proceedings of Machine Learning Research, 162 [Proceedings of the 39th International Conference on Machine Learning], 11119–11133.
Kons, Z., Shechtman, S., Sorin, A., Hoory, R., Rabinovitz, C., & da Silva Morais, E. (2018). Neural TTS voice conversion. In 2018 IEEE Spoken Language Technology Workshop (SLT) (pp. 290–296). IEEE. https://doi.org/10.1109/SLT.2018.8639550
Lakoff, R. (1969). A syntactic argument for nega-tive transportation. In R. I. Binnick, A. Da-vidson, G. M. Green, & J. L. Morgan (Eds.), Papers from the 5th Regional Meeting of the Chicago Linguistic Society (pp. 140–147). De-partment of Linguistics, University of Chicago.
Leech, G., & Svartvik, J. (1994). A communicative grammar of English. Longman. https://doi.org/10.4324/9781315836041
Mateo, M. (2014). Exploring pragmatics and pho-netics for successful translation. VIAL (Vigo In-ternational Journal of Applied Linguistics), 11, 111–135.
McCawley, J. D. (1988). The syntactic phenomena of English. University of Chicago Press.
Mott, B. (2011). English phonetics and phonology for Spanish speakers. Publicacions i Edicions de la Universitat de Barcelona.
Parrot, M. (2010). Grammar for English language teachers. Cambridge University Press. https://doi.org/10.1017/9781009406536
Roach, P. (2009). English phonetics and phonolo-gy: A practical course. Cambridge University Press.
Rodríguez Fernández-Peña, A. C. (2022). La equivalencia pragmática de las 3Ts en inglés y español. LynX: Panorámica de estudios lingüís-ticos, Extra 25 [Gramática Contrastiva: Méto-dos y Perspectivas, ed. M. A. Lledó], 177–218.
Sadock, J. M. (1974). Toward a linguistic theory of speech acts. Academic Press.
Shen, J., Pang, R., Weiss, R. J., Schuster, M., Jait-ly, N., Yang, Z., Chen, Z., Zhang, Y., & Skerrb-Ryan, R. (2018). Natural TTS Synthesis by conditioning WaveNet on MEL spectro-gram predictions. 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 4779–4783). IEEE. https://doi.org/10.1109/ICASSP.2018.8461368
Swan, M. (2005). Practical English usage. Oxford University Press.
Tench, P. (2009). The pronunciation of grammar [Conference presentation]. 3rd International Congress on English Grammar. Salem, TN, In-dia.
Thomson, A. J., & Martinet, A. V. (1986). A prac-tical English grammar. Oxford University Press.
van den Oord, A., Vinyals, O., & Kavukcuoglu, K. (2017). Neural discrete representation learn-ing. In I. Guyon, U. Von Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, & R. Garnett (Eds.), Advances in neural information processing systems 30 (NIPS 2017) (pp. 6306–6315). Curran Associates Inc.
Vince, M., & Emmerson, P. (2003). First Certifi-cate language practice. Macmillan Education.
Wells, J. C. (2006). English intonation. an intro-duction. Cambridge University Press.
Downloads
Published
How to Cite
Issue
Section
License
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
All articles published online by Estudios de Fonética Experimental are licensed under Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International (CC BY-NC-ND 4.0 DEED), unless otherwise noted. Estudios de Fonética Experimental is an open access journal. Estudios de Fonética Experimental is hosted by RCUB (Revistes Científiques de la Universitat de Barcelona), powered by Open Journal Systems (OJS) software. The copyright is not transferred to the journal: authors hold the copyright and publishing rights without restrictions. The author is free to use and distribute pre and post-prints versions of his/her article. However, preprint versions are regarded as a work-in-progress version used as internal communication with the authors, and we prefer to share postprint versions.