AI is great, isn’t it? Tone direction and illocutionary force delivery of tag questions in Amazon’s AI NTTS Polly

Authors

DOI:

https://doi.org/10.1344/efe-2023-32-227-242

Keywords:

illocutionary force, tag questions, intonation, text-to-speech, artificial intelligence (AI)

Abstract

This work provides a descriptive analysis of the tone direction and its inherent illocutionary force in question tags delivered by Amazon’s neural text-to-speech system Polly. We included three types of tag questions (reverse-polarity tags — both positive and negative —, copy tags and command tags) for which 10 sentences were used as input in each case. The data included 600 utterances produced by British and American English voices currently available on Amazon’s NTTS. The audio files were examined with the speech analysis software Praat to identify the tone pattern for each utterance and confirm the intended illocutionary force. The results show that Amazon’s AI speech synthesis technology is not yet fully reliable and produces a high rate of utterances whose pragmatic load is undesired when using natural spontaneous speech traits as question tags.

References

Boersma, P., & Weenink, D. (1992–2023). Praat: Doing phonetics by computer (Version 6.3.18) [Computer program]. http://www.praat.org/

Cattel, R. (1973). Negative transportation and tag questions. Language, 49, 612–639. https://doi.org/10.2307/412354

Cohen, M. H., Giangola, J. P. & Balogh, J. (2004). Voice user interface design. Addison-Wesley Professional.

Collins, B., & Mees, I. M. (2013). Practical pho-netics and phonology. A Resource Book for Students. Routledge. https://doi.org/10.4324/9780203080023

Cruttenden, A. (2014). Gimson’s Pronunciation of English. Routledge. https://doi.org/10.4324/9780203784969

Estebas Vilaplana, E. (2014). Teach yourself Eng-lish pronunciation: An interactive course for Spanish speakers. Universidad Nacional de Educación a Distancia.

Gómez González, M. A., & Sánchez Roura, M. T. (2016). English pronunciation for speakers of Spanish: from theory to practice. Walter de Gruyter. https://doi.org/10.1515/9781501510977

Kay, P. (2006). Pragmatic aspects of grammatical constructions. In L. R. Horn, & G. Ward (Eds.), The Handbook of Pragmatics. (pp. 675–700). Blackwell Publishing.

Kim, H., Kim, S., & Yoon, S. (2022). Guided-TTS: A diffusion model for text-to-speech via classifier guidance. Proceedings of Machine Learning Research, 162 [Proceedings of the 39th International Conference on Machine Learning], 11119–11133.

Kons, Z., Shechtman, S., Sorin, A., Hoory, R., Rabinovitz, C., & da Silva Morais, E. (2018). Neural TTS voice conversion. In 2018 IEEE Spoken Language Technology Workshop (SLT) (pp. 290–296). IEEE. https://doi.org/10.1109/SLT.2018.8639550

Lakoff, R. (1969). A syntactic argument for nega-tive transportation. In R. I. Binnick, A. Da-vidson, G. M. Green, & J. L. Morgan (Eds.), Papers from the 5th Regional Meeting of the Chicago Linguistic Society (pp. 140–147). De-partment of Linguistics, University of Chicago.

Leech, G., & Svartvik, J. (1994). A communicative grammar of English. Longman. https://doi.org/10.4324/9781315836041

Mateo, M. (2014). Exploring pragmatics and pho-netics for successful translation. VIAL (Vigo In-ternational Journal of Applied Linguistics), 11, 111–135.

McCawley, J. D. (1988). The syntactic phenomena of English. University of Chicago Press.

Mott, B. (2011). English phonetics and phonology for Spanish speakers. Publicacions i Edicions de la Universitat de Barcelona.

Parrot, M. (2010). Grammar for English language teachers. Cambridge University Press. https://doi.org/10.1017/9781009406536

Roach, P. (2009). English phonetics and phonolo-gy: A practical course. Cambridge University Press.

Rodríguez Fernández-Peña, A. C. (2022). La equivalencia pragmática de las 3Ts en inglés y español. LynX: Panorámica de estudios lingüís-ticos, Extra 25 [Gramática Contrastiva: Méto-dos y Perspectivas, ed. M. A. Lledó], 177–218.

Sadock, J. M. (1974). Toward a linguistic theory of speech acts. Academic Press.

Shen, J., Pang, R., Weiss, R. J., Schuster, M., Jait-ly, N., Yang, Z., Chen, Z., Zhang, Y., & Skerrb-Ryan, R. (2018). Natural TTS Synthesis by conditioning WaveNet on MEL spectro-gram predictions. 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 4779–4783). IEEE. https://doi.org/10.1109/ICASSP.2018.8461368

Swan, M. (2005). Practical English usage. Oxford University Press.

Tench, P. (2009). The pronunciation of grammar [Conference presentation]. 3rd International Congress on English Grammar. Salem, TN, In-dia.

Thomson, A. J., & Martinet, A. V. (1986). A prac-tical English grammar. Oxford University Press.

van den Oord, A., Vinyals, O., & Kavukcuoglu, K. (2017). Neural discrete representation learn-ing. In I. Guyon, U. Von Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, & R. Garnett (Eds.), Advances in neural information processing systems 30 (NIPS 2017) (pp. 6306–6315). Curran Associates Inc.

Vince, M., & Emmerson, P. (2003). First Certifi-cate language practice. Macmillan Education.

Wells, J. C. (2006). English intonation. an intro-duction. Cambridge University Press.

Downloads

Published

2023-11-28

How to Cite

Rodríguez Fernández-Peña, A. C. (2023). AI is great, isn’t it? Tone direction and illocutionary force delivery of tag questions in Amazon’s AI NTTS Polly. Journal of Experimental Phonetics, 32, 227–242. https://doi.org/10.1344/efe-2023-32-227-242

Issue

Section

Miscellaneous