CALLIOPE: A multi-dimensional model for the prosodic characterization of Information Units

Sonia Cenceschi; Licia Sbattella; Roberto Tedesco

doi:10.1344/efe-2021-30-227-245

Authors

Sonia Cenceschi Politecnico di Milano https://orcid.org/0000-0002-4145-9593
Licia Sbattella Politecnico di Milano https://orcid.org/0000-0001-5344-5976
Roberto Tedesco Politecnico di Milano https://orcid.org/0000-0002-2830-4247

DOI:

https://doi.org/10.1344/efe-2021-30-227-245

Keywords:

Prosody, Prosodic model, Information unit, Calliope

Abstract

CALLIOPE is a conceptual multi-dimensional model that aims at approximating and categorizing the prosodic phenomena taking into account of all possible independent factors affecting the sound of so-called Information Units (IUs). In CALLIOPE, each IU is associated with a tuple composed of 12 labels, each belonging to a different dimension representing a characteristic influencing the prosodic behaviour. Its ultimate aim is creating well-defined corpora suitable for linguistic and engineering research.

References

Akçay, M. B., & Oğuz, K. (2020). Speech emotion recognition: Emotional models, databases, features, preprocessing methods, supporting modalities, and classifiers, Speech Communication, 116, 56-76. https://doi.org/10.1016/j.specom.2019.12.001

Arora, S., Batra, K., & Singh, S. (2013). Dialogue system: A brief review, arXiv:1306.4134.

Austin, J. L. (1975). How to do things with words. Oxford university press. https://doi.org/10.1093/acprof:oso/9780198245537.001.0001

Baker, J. M., Deng, L., Glass, J., Khudanpur, S., Lee, C. H., Morgan, N., & O'Shaughnessy, D. (2009). Developments and directions in speech recognition and understanding, Part 1 [DSP Education], IEEE Signal processing magazine, 26(3), 75-80. https://doi.org/10.1109/MSP.2009.932166

Beckman, M. E., Hirschberg, J. B., & Shattuck-Hufnagel, S. (2004). The original ToBI system and the evolution of the ToBI framework, Prosodic typology: The phonology of intonation and phrasing, (pp. 9-54). Oxford University Press. https://doi.org/10.1093/acprof:oso/9780199249633.003.0002

Bennett, R. T. (2012). Foot-conditioned phonotactics and prosodic constituency (Unpublished doctoral dissertation). UC Santa Cruz, United States of America.

Berruto, G., & Cerruti, M. S. (2015). Manuale di sociolinguistica. Utet Università.

Boersma, P., & Weenink, D. (1996). Praat, a system for doing phonetics by computer, version 3.4. Institute of Phonetic sciences of the University of Amsterdam, Report, 132, 182.

Booij, G. (1999). The role of the prosodic word in phonotactic generalizations, Amsterdam studies in the theory and history of linguistic science series 4, 47-72. John Benjamins. https://doi.org/10.1075/cilt.174.04boo

Bunt, H. (2009). The DIT++ taxonomy for functional dialogue markup. Eighth International Conference on Autonomous Agents and Multiagent Systems, Towards a Standard Markup Language for Embodied Dialogue Acts Workshop, Bucarest, Romania (AAMAS2009).

Bunt, H., Petukhova, V., Traum, D., & Alexandersson, J. (2017). Dialogue act annotation with the ISO 24617-2 standard. Multimodal interaction with W3C standards (pp. 109-135). Springer. https://doi.org/10.1007/978-3-319-42816-1_6

Büring, D. (2009). Towards a typology of focus realization, Information Structure, ed. by Malte Zimmermann and Caroline Féry (pp. 177-205). Oxford University Press. https://doi.org/10.1093/acprof:oso/9780199570959.003.0008

Calliope, L., & Fant, G. (1989). La parole et son traitement automatique. Masson.

Carbone, F., & Petrone, C. (2020). L'impact de la prosodie et du lexique des émotions sur l'activité électrodermale en français, Language et émotions.

Cenceschi, S. (2019). Speech analysis for automatic prosody recognition (Unpublished doctoral dissertation). Politecnico di Milano, Italy.

Cenceschi S., Meluzzi C., Nese, N. (2021) Speaker's identification across recording modalities: a preliminary phonetic experiment1, Studi AISV (Vol. 7), in print.

Cenceschi, S., Tedesco, R., Sbattella, L., Losio, D., & Luchetti, M. (2019). PESInet: Automatic Recognition of Italian Statements, Questions, and Exclamations With Neural Networks, Proceedings of the Sixth Italian Conference on Computational Linguistics 2019, Bari, Italy, (CLiC-it19).

Cenceschi, S., Sbattella, L., & Tedesco, R. (2018a). Influence of semantics on the perception of corrective focus in spoken Italian, Proceedings of 9th Tutorial and Research Workshop on Experimental Linguistics, Paris, France (Exling18). https://doi.org/10.36505/ExLing-2018/09/0006/000339

Cenceschi, S., Sbattella, L., & Tedesco, R. (2018b). Towards automatic recognition of prosody, Proceedings of the 9th International Conference on Speech Prosody, Poznań, Poland (SpeechProsody2018). https://doi.org/10.21437/SpeechProsody.2018-65

Cenceschi, S., Sbattella, L., & Tedesco, R. (2018c). Verso il riconoscimento automatico della prosodia, Studi AISV (Vol. 3), 433-440.

Cole, J. (2015). Prosody in context: a review, Language, Cognition and Neuroscience, 30(1-2), 1-31. Taylor & Francis Online. https://doi.org/10.1080/23273798.2014.963130

Coseriu, E. (1980). Historische Sprache" und" Dialekt (pp. 45-61). Franz Steiner Verlag.

Cresti, E. (2000). Corpus di italiano parlato (Vol. 1). Accademia della Crusca.

Cresti, E. (2014). Syntactic properties of spontaneous speech in the Language into Act Theory. Spoken Corpora and Linguistic Studies (pp. 365-410). John Benjamins. https://doi.org/10.1075/scl.61.13cre

Cresti, E. (2020). The pragmatic analysis of speech and its illocutionary classification according to the Language into Act Theory. S. Izre ́el, H. Mello, A. Panunzi & T. Raso (eds), In search of basic units of spoken language: A corpus-driven approach (pp. 181-219). John Benjamins. https://doi.org/10.1075/scl.94.06cre

Cresti, E., Martin, P., & Moneglia, M. (1998). L'intonazione delle illocuzioni naturali rappresentative: analisi e validazione percettiva, Atti delle IX giornate del gruppo di fonetica sperimentale, Venice, Italy.

Davis, M. H., & Johnsrude, I. S. (2007). Hearing speech sounds: top-down influences on the interface between audition and speech perception. Hearing research, 229(1-2), 132-147. Elsevier. https://doi.org/10.1016/j.heares.2007.01.014

De Iacovo, V. (2019). Intonation analysis on some samples of Italian dialects: an instrumental approach (Vol. 3). Edizioni dell'Orso.

D'Imperio, M. (2002). Italian intonation: An overview and some questions. Probus, 14(1), 37-69. De Gruyter Mouton. https://doi.org/10.1515/prbs.2002.005

Domínguez, L. (2004). Mapping focus: The syntax and prosody of focus in Spanish (Unpublished doctoral dissertation). Boston University, United States of America.

Douglas-Cowie, E., Cowie, R., Sneddon, I., Cox, C., Lowry, O., Mcrorie, M., ... & Karpouzis, K. (2007). The HUMAINE database: Addressing the collection and annotation of naturalistic and induced emotional data, International conference on affective computing and intelligent interaction, Lisbon, Portugal (ACII2007).

Dupriez, B. (2016). A dictionary of literary devices. University of Toronto Press.

Fassone, G., Valcella, F., Pallini, S., Scarcella, F., Tombolini, L., Ivaldi, A., & Liotti, G. (2012). Assessment of Interpersonal Motivation in Transcripts (AIMIT): An inter‐and intra‐rater reliability study of a new method of detection of interpersonal motivational systems in psychotherapy, Clinical psychology & psychotherapy, 19(3), 224-234. Wiley. https://doi.org/10.1002/cpp.742

Ferrari, G. (2004). State of the art in Computational Linguistics. Linguistics today: Facing a greater challenge, 163-186. John Benjamins. https://doi.org/10.1075/z.126.09fer

Firenzuoli, V. (2003). Le forme intonative di valore illocutivo dell'italiano parlato. Analisi sperimentale di un corpus di parlato spontaneo (LABLITA) (Unpublished doctoral dissertation), Università di Firenze, Italy.

Fujisaki, H. (1997). Prosody, models, and spontaneous speech, Computing prosody (pp. 27-42). Springer. https://doi.org/10.1007/978-1-4612-2258-3_3

Fujisaki, H. (2004). Information, prosody, and modeling-with emphasis on tonal features of speech, Speech Prosody, Nara, Japan. https://doi.org/10.21437/SpeechProsody.2004-1

Fux, T., Feng, G., & Zimpfer, V. (2011). Relevant acoustic features of speech signals for natural-to-shouted voice transformation, 6th European Congress on Acoustics, Forum Acusticum, Aalborg, Denmark.

Ghaffarzadegan, S., Bořil, H., & Hansen, J. H. (2014). Model and feature based compensation for whispered speech recognition, Fifteenth Annual Conference of the International Speech Communication Association, Singapore (Interspeech2014). https://doi.org/10.21437/Interspeech.2014-232

Goldrick, M. (2004). Phonological features and phonotactic constraints in speech production, Journal of Memory and Language, 51(4), 586-603. Elsevier. https://doi.org/10.1016/j.jml.2004.07.004

Gussenhoven, C. (2008). Types of focus in English, Topic and focus (pp. 83-100). Springer. https://doi.org/10.1007/978-1-4020-4796-1_5

Hansen, J. H., Nandwana, M. K., & Shokouhi, N. (2017). Analysis of human scream and its impact on text-independent speaker verification, The Journal of the Acoustical Society of America, 141(4), 2957-2967. https://doi.org/10.1121/1.4979337

Harris, R. A. (1997). A handbook of rhetorical devices. Retrieved from https://hellesdon.org/documents/Advanced%20Rhetoric.pdf

Haugen, E. (1966). Dialect, Language, Nation 1. American anthropologist, 68(4), 922-935. Wiley. https://doi.org/10.1525/aa.1966.68.4.02a00040

Hymes, D. (2001). Foundations in sociolinguistics: An ethnographic approach. Psychology Press.

Kazanina, N., Phillips, C., & Idsardi, W. (2006). The influence of meaning on the perception of speech sounds, Proceedings of the National Academy of Sciences, 103(30), 11381-11386. https://doi.org/10.1073/pnas.0604821103

Klasen, M., von Marschall, C., Isman, G., Zvyagintsev, M., Gur, R. C., & Mathiak, K. (2018). Prosody production networks are modulated by sensory cues and social context, Social cognitive and affective neuroscience, 13(4), 418-429. Oxford University Press. https://doi.org/10.1093/scan/nsy015

Kompe, R., & Kompe, R. (1997). Prosody in speech understanding systems (Vol. 1307). Springer. https://doi.org/10.1007/3-540-63580-7

Kratzer, A. (2012). Modals and conditionals: New and revised perspectives (Vol. 36). Oxford University Press. https://doi.org/10.1093/acprof:oso/9780199234684.001.0001

Jovičić, S. T. (1998). Formant feature differences between whispered and voiced sustained vowels. Acta Acustica united with Acustica, 84(4), 739-743.

Leoni, F. A. (2001). Il ruolo dell'udito nella comunicazione linguistica. Il caso della prosodia, Italian Journal of Linguistics, 13, 45-68.

Leoni, F. A. (2017). Lingua e patologia: le frontiere interdisciplinari del linguaggio. Aracne.

Leoni, F. A., & Giordano, F. (2005). R. (a cura di), Italiano Parlato. Analisi di un dialogo. Liguori.

Likitha, M. S., Gupta, S. R. R., Hasitha, K., & Raju, A. U. (2017). Speech based human emotion recognition using MFCC, 2017 IEEE International conference on wireless communications, signal processing and networking, Chennai, India. https://doi.org/10.1109/WiSPNET.2017.8300161

Liotti, G., & Monticelli, F. (2008). I sistemi motivazionali nel dialogo clinico. Raffaello Cortina.

Llisterri, J. (1992, July). Speaking styles in speech research, ELSNET/ESCA/SALT Workshop on Integrating Speech and Natural Language, Dublin, Irland.

López Zorrilla, A., De Velasco Vázquez, M., Cenceschi, S., & Torres Barañano, M. I. (2018). Corrective focus detection in Italian speech using neural networks, Acta Polytechnica Hungarica, 15(5), 109-127. https://doi.org/10.12700/APH.15.5.2018.5.7

Maienborn, C., von Heusinger, K., & Portner, P. (Eds.). (2011). Semantics: An international handbook of natural language meaning (Vol. 33). Walter de Gruyter. https://doi.org/10.1515/9783110226614

Nencioni, G. (1983). Di scritto e di parlato: discorsi linguistici (Vol. 6). Zanichelli.

Nicholson, J., Takahashi, K., & Nakatsu, R. (2000). Emotion recognition in speech using neural networks, Neural computing & applications, 9(4), 290-296, Springer. https://doi.org/10.1007/s005210070006

Noth, E., Batliner, A., Kießling, A., Kompe, R., & Niemann, H. (2000). Verbmobil: The use of prosody in the linguistic components of a speech understanding system. IEEE Transactions on Speech and Audio processing, 8(5), 519-532. https://doi.org/10.1109/89.861370

Origlia, A., Cutugno, F., & Galatà, V. (2014). Continuous emotion recognition with phonetic syllables. Speech Communication, 57, 155-169. Elsevier. https://doi.org/10.1016/j.specom.2013.09.012

Plutchik, R. (1991). The emotions. University Press of America.

Prieto, P., Borràs-Comes, J., & Roseano, P. (2010). Interactive atlas of Romance intonation. Web page: http://prosodia. upf. edu/iari.

Romano, A., Contini, M., & Lai, J. P. (2014). L'Atlas Multimédia Prosodique de l'Espace Roman: uno strumento per lo studio della variazione geoprosodica, 20 Jahre digitale Sprachgeographie, Humboldt-Universität - Institut für Romanistik , 27-51.

Sadock, J. M., & Arnold, M. Z. (1985). Speech act distinctions in syntax, in Timothy Shopen (ed.), Language typology and syntactic description, Vol. 1, (pp. 155-196). Cambridge University Press.

Sbattella, L., Tedesco, R., & Trivilini, A. (2014). Forensic examinations: Computational analysis and information extraction, International Conference on Forensic Science-Criminalistics Research, Singapore (FSCR). https://doi.org/10.1037/e577482014-006

Schuller, B., Batliner, A., Steidl, S., & Seppi, D. (2011). Recognising realistic emotions and affect in speech: State of the art and lessons learnt from the first challenge, Speech Communication, 53(9-10), 1062-1087. Elsevier. https://doi.org/10.1016/j.specom.2011.01.011

Schröder, M., Pirker, H., & Lamolle, M. (2006, May). First suggestions for an emotion annotation and representation language, Proceedings of The International Conference on Language Resources and Evaluation, Genoa, Italy (LREC2006).

Swain, M., Routray, A., & Kabisatpathy, P. (2018). Databases, features and classifiers for speech emotion recognition: a review, International Journal of Speech Technology, 21(1), 93-120. Springer. https://doi.org/10.1007/s10772-018-9491-z

Tamburini, F., Bertini, C., & Bertinetto, P. M. (2014). Prosodic prominence detection in Italian continuous speech using probabilistic graphical models, Proceedings of the 7th International Conference on Speech Prosody, Dublin, Germany (SpeechProsody2014). https://doi.org/10.21437/SpeechProsody.2014-45

Tomkins, S. S. (1984). Affect theory. In Klaus R. Scherer, Paul Ekma (Eds.), Approaches to emotion, (pp. 163-195), Psychology Press.

Vasco, V., Gensini, S., & Leoni, F. A. (2010). Tu chiamale se vuoi emozioni": Espressione e riconoscimento degli stati d'animo nel parlato (Unpublished doctoral dissertation). University La Sapienza, Italy.

Vogt, T. (2010). Real-time automatic emotion recognition from speech (Unpublished doctoral dissertation). Universität Bielefeld, Germany.

Wilks, Y., Catizone, R., Worgan, S., & Turunen, M. (2010). Some background on dialogue management and conversational speech for dialogue systems, Computer Speech and Language, 25(2), 128. Elsevier. https://doi.org/10.1016/j.csl.2010.03.001

Williams, C. E., & Stevens, K. N. (1972). Emotions and speech: Some acoustical correlates, The Journal of the Acoustical Society of America, 52(4B), 1238-1250. https://doi.org/10.1121/1.1913238

Zhang, C., & Hansen, J. H. (2007). Analysis and classification of speech mode: whispered through shouted, Eighth Annual Conference of the International Speech Communication Association, Antwerp, Belgium (Interspeech2007). https://doi.org/10.21437/Interspeech.2007-621

Zentner, M., Grandjean, D., & Scherer, K. R. (2008). Emotions evoked by the sound of music: Characterization, classification, and measurement. Emotion, 8(4), 494-521. American Psychological Association. https://doi.org/10.1037/1528-3542.8.4.494

CALLIOPE

A multi-dimensional model for the prosodic characterization of Information Units

Authors

DOI:

Keywords:

Abstract

References

Downloads

Published

How to Cite

Issue

Section

License

Most read articles by the same author(s)

Information

Make a Submission