Phonetic characteristics of spontaneous speech in a total laryngectomized Italian speaker: Perspectives for speech enhancement algorithms

Authors

Keywords:

Oesophageal Speech, Voice quality, Voice disorders, Vowel formants, Clinical phonetics

Abstract

This paper describes the main phonetic features of an Italian L1 74 y. o. speaker (ESO01) after he endured total laryngectomy in 2015 with the complete removal of vocal folds due to five tumour masses. We offer an acoustic analysis of the spontaneous speech of this target speaker, in order to lay ground to the development of spontaneous speech enhancement and reconstruction algorithms for non-invasive aids. A semi-automatic analysis extracts formants’ values (F0, F1, F2, F3) on the midpoint and on 7 time-points, together with other acoustic cues. Our results show that our target speaker presents a low and rough voice, but his vowels are clearly differentiated. Furthermore, we find vocoid and air release to be extremely consistent in his acoustic characteristics during oesophageal phonation.

References

Bressmann, T. (2010). Speech disorders related to head and neck cancer: Laryngectomy, glossectomy, and velopharyngeal and maxillofacial deficits. In J. S. Damico, N. Müller, & M. Ball (Eds.), The handbook of language and speech disorders (pp. 497-526). Wiley-Blackwell.

Brosky, M. E. (2007). The role of saliva in oral health: Strategies for prevention and management of xerostomia, The Journal of Supportive Oncology, 5(5), 215-225.

Brouha, X., Tromp, D., Hordijk, G. J., Winnubst, J., & De Leeuw, R. (2005). Role of alcohol and smoking in diagnostic delay of head and neck cancer patients. Acta Oto-Laryngologica, 125(5), 552-556.

Campbell, N. & Beckman, M. (1997). Stress, prominence, and spectral tilt. In A. Botinis, G. Kouroupetroglou, & G. Crayiannis (Eds.), Intonation: theory, models and applications (pp. 67-70). European Speech Communication Association.

Casper, J. K., & Colton, R. H. (1998). Clinical manual for laryngectomy and head & neck cancer rehabilitation. Singular.

Cervera, T., Miralles, J. L., & González-Alvarez, J. (2001). Acoustical analysis of Spanish vowels produced by laryngectomized subjects. Journal of Speech, Language and Hearing Resarch, 44(5), 988-96.

Childers, D. G. (ed.). (1978). Modern spectrum analysis. IEEE Computer Society Press.

Christensen, J. M., Weinberg, B., & Alfonso, P. J. (1978). Productive voice onset time characteristics of esophageal speech. Journal of Speech. Language and Hearing Resarch, 21(1), 56-62.

Cohen, A., Van Den Broeckero, M. P., & Van Geel, R. C. (1984). A study of pitch phenomena and applications in electrolarynx speech. Speech and Language, 11, 197-248.

Cummings, L. C., & Cooper, G. S. (2008). Descriptive epidemiology of esophageal carcinoma in the Ohio Cancer Registry, Cancer detection and prevention, 32(1), 87-92.

Esen Aydinli, F., Kulak Kayikci, M. E., & Suslu, N. (2019). Temporal and Frequency Characteristics of Turkish Vowels in Laryngectomized Speakers: Preliminary Study. Medeniyet Medical Journal, 34(2), 149-159.

Debruyne, F., Delaere, P., Wouters, J., & Uwents, P. (1994). Acoustic analysis of tracheoesophageal speech. The Journal of Laryngology and Otology, 108, 325-328.

Doyle, P. C., & Finchem, E. A. (2019). Teaching esophageal speech: A process of collaborative instruction. In P. C. Doyle (Ed.), Clinical Care and Rehabilitation in Head and Neck Cancer (pp. 145-161). Springer.

Di Paolo, M., Yaeger-Dror, M., & Wassink, A. B. (2011). Analyzing vowels. In M. Di Paolo, & M. Yaeger-Dror (Eds.), Sociophonetics. A student’s guide (pp. 87-106). Routlege.

Draetta, L. (2019). Dittonghi e iati nella pronuncia di bambini biellesi: un’analisi sociofonetica [MA thesis]. Università di Pavia.

Erzin, E. (2009). Improving throat microphone speech recognition by joint analysis of throat and acoustic microphone recordings. IEEE transactions on audio, speech, and language processing, 17(7), 1316-1324.

Fant, G. (1960). The acoustics of speech. Mouton De Gruyter.

Fantini, M., Maccarini, A. R., Firino, A., Gallia, M., Carlino, V., Gorris, C., Spadola Bisetti, M., Crosetti, E., & Succo, G. (2021). Validation of the Acoustic Voice Quality Index (AVQI) Version 03.01 in Italian. Journal of Voice, S0892-1997(21)00092-8 [Advance online publication].

Giannini, A., & Pettorino, M. (1992). La fonetica sperimentale. Edizioni Scientifiche Italiane.

Goldstein, D. P., & Irish, J. C. (2005). Head and neck squamous cell carcinoma in the young patient. Current Opinion in Otolaryngology and Head and Neck Surgery, 13(4), 207-11.

Graham, M. S. (2005). Taking it to the limits: Achieving proficient esophageal speech. In P. C. Doyle, & R. L. Keith (Eds.), Contemporary considerations in the treatment and rehabilitation of head and neck cancer (pp. 379-430). Pro-Ed.

Heeringa, W., & Van de Velde, H. (2018). Visible Vowels: A Tool for the Visualization of Vowel Variation. In I. Skadiņa, & M. Eskevich (Eds.), Proceedings of CLARIN Annual Conference 2018, Pisa, Italy (pp. 124-127). CLARIN.

Jackson, M., Ladefoged, P., Huffman, M., & Antoñanzas‐Barroso, N. (1985). Measures of spectral tilt. The Journal of the Acoustical Society of America, 77, S86 [2:49, MM8].

Kobayashi, N., Horiguchi, S., Baer, T. (1985) Aerodynamic and acoustic characteristics of the voicing distinction in electronic larynx speech. The Journal of the Acoustical Society of America, 77, S86 [3:13, MM10].

Liu, H., & Ng, M. L. (2007). Electrolarynx in voice rehabilitation. Auris Nasus Larynx, 34(3), 327-332.

Liu, H., Ng, M. L. (2009). Formant characteristics of vowels produced by Mandarin esophageal speakers. Journal of Voice, 23(2), 255-60.

Maryn Y, Corthals P, Van Cauwenberge P, et al. (2010). Toward improved ecological validity in the acoustic measurement of overall voice quality: Combining continuous speech and sustained vowels. Journal of Voice, 24(5), 540-555.

Meluzzi, C. (2021). Sound Spectrography. In M. Ball (Ed.), Handbook of Clinical Phonetics (pp. 418-443). Routledge.

Nakajima, Y., Kashioka, H., Campbell, N., & Shikano, K. (2006). Non-audible murmur (NAM) recognition. IEICE Transactions on Information and Systems, 89(1), 1-4.

Pascual, S., Serrà, J., & Bonafonte, A. (2019). Towards generalized speech enhancement with generative adversarial networks. arXiv preprint. In G. Kubin, & Z. KačičProc (Eds.), Proceedings of Interspeech 2019, Graz, Austria (pp. 1791-1795). International Speech Communication Association.

Patel, M., Parmar, M., Doshi, S., Shah, N., & Patil, H. A. (2019). Novel Inception-GAN for Whisper-to-Normal Speech Conversion. In M. Pucher (Ed.), Proceedings of 10th ISCA Speech Synthesis Workshop (SSW 10), Vienna, Austria (pp. 87-92). International Speech Communication Association.

Powell, T. W. (2013). Research Ethics. In N. Muller & M. J. Ball (Eds.), Research Methods in Clinical Linguistics and Phonetics. A practical guide (pp. 10-27). Wiley-Blackwell.

Press, W. H., Teukolsky, S. A., Vetterling, W. T., & Flannery, B. P. (2007). Numerical recipes 3rd edition: The art of scientific computing. Cambridge University Press.

Preston, J. L., Maas, E., Whittle, J., Leece, M. C., & McCabe, P. (2016). Limited acquisition and generalisation of rhotics with ultrasound visual feedback in childhood apraxia. Clinical Linguistics & Phonetics, 30(3-5), 363-381.

Ribeiro, V. V., Dassie-Leite, A. P., Pereira, E. C., Nunes Santos, A. D., Martins, P., & Irineu, R. de A. (2020). Effect of wearing a face mask on vocal self-perception during a pandemic. Journal of Voice, S0892-1997(20)30356-8 [Advance online publication].

Robbins, J. (1984). Acoustic differentiation of laryngeal, esophageal, and tracheo-oesophageal speech. Journal of Speech and Hearing Research, 27(4), 577-585.

Sahidullah, M., Gonzalez Hautamäki, R., Lehmann, Thomsen., D. A., Kinnunen, T., Tan, Z.-H., Hautamäki, V., Parts, R., & Pitkänen, M. (2016). Robust speaker recognition with combined use of acoustic and throat microphone speech. In N. Morgan (Ed.), Proceedings of Interspeech 2016, San Francisco, USA (pp. 1720-1724). ISCA.

Sahidullah, M., Thomsen, D. A. L., Hautamäki, R. G., Kinnunen, T., Tan, Z. H., Parts, R., & Pitkänen, M. (2017). Robust voice liveness detection and speaker verification using throat microphones. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 26(1), 44-56.

Shah, N. J., & Patil, H. A. (2020). Non-audible murmur to audible speech conversion. In H. A. Patil, & A. Neustein (Eds.), Voice Technologies for Speech Reconstruction and Enhancement (pp. 125-150). De Gruyter.

Shahina, A., & Yegnanarayana, B. (2007). Mapping speech spectra from throat microphone to close-speaking microphone: A neural network approach. EURASIP Journal on Advances in Signal Processing, 087219.

Sharifzadeh, H. R., McLoughlin, I. V., & Ahmadi, F. (2010). Reconstruction of normal sounding speech for laryngectomy patients through a modified CELP codec. IEEE Transactions on Biomedical Engineering, 57(10), 2448-2458.

Sisty, N. L., & Weinberg, B. (1972). Formant frequency characteristics of esophageal speech. Journal of Speech and Hearing Research, 15(2), 439-448.

Štajner-Katušić, S., Horga, D., Mušura, M., & Globlek, D. (2004). Voice and Speech after Laryngectomy. Clinical Linguistics & Phonetics, 20(2/3), 195-203.

Stylianou, Y. (1996). Harmonic plus noise models for speech, combined with statistical methods, for speech and speaker modification [PhD thesis]. Ecole Nationale Superieure des Telecommunications.

Toda, T., & Shikano, K. (2005). NAM-to-speech conversion with Gaussian mixture models. In I. Trancoso (Ed.), Proceedings of Interspeech 2005, Lisbon, Portugal (pp. 1957-1960). ISCA.

Tran, V. A., Bailly, G., Loevenbruck, H., & Toda, T. (2009). Multimodal HMM-based NAM-to-speech conversion. In. R. Moore (Ed.), Proceedings of Interspeech 2009, Brighton, United Kingdom (pp. 656-659). ISCA.

Turan, M. A. T. (2018). Enhancement of Throat Microphone Recordings Using Gaussian Mixture Model Probabilistic Estimator. arXiv preprint, arXiv:1804.05937.

van Sluis, K. E., van der Molen, L., van Son, R. J., Hilgers, F. J., Bhairosing, P. A., & van den Brekel, M. W. (2018). Objective and subjective voice outcomes after total laryngectomy: A systematic review. European Archives of Oto-Rhino-Laryngology, 275(1), 11-26.

Williams, S. E., & Watson, J. B. (1987). Speaking proficiency variations according to method of alaryngeal voicing. The Laryngoscope, 97(6), 737-739.

Zheng, Y., Liu, Z., Zhang, Z., Sinclair, M., Droppo, J., Deng, L., & Huang, X. (2003). Air-and bone-conductive integrated microphones for robust speech detection and enhancement. In J. Bilmes, & W. Byrne (Eds.), IEEE Workshop on Automatic Speech Recognition and Understanding [St. Thomas, VI, USA] (pp. 249-254). IEEE.

Zhou, J., Liang, R., Zhao, L., & Zou, C. (2012). Whisper intelligibility enhancement using a supervised learning approach. Circuits, Systems, and Signal Processing, 31(6), 2061-2074.

Downloads

Published

2022-03-13

How to Cite

Meluzzi, C., Cenceschi, S., Dani, F. R., & Trivilini, A. (2022). Phonetic characteristics of spontaneous speech in a total laryngectomized Italian speaker: Perspectives for speech enhancement algorithms. Journal of Experimental Phonetics, 31, 45–58. Retrieved from https://revistes.ub.edu/index.php/experimentalphonetics/article/view/43979

Issue

Section

Articles