CALLIOPE: A multi-dimensional model for the prosodic characterization of Information Units

Authors

Keywords:

Prosody, Prosodic model, Information unit, Calliope

Abstract

CALLIOPE is a conceptual multi-dimensional model that aims at approximating and categorizing the prosodic phenomena taking into account of all possible independent factors affecting the sound of so-called Information Units (IUs). In CALLIOPE, each IU is associated with a tuple composed of 12 labels, each belonging to a different dimension representing a characteristic influencing the prosodic behaviour. Its ultimate aim is creating well-defined corpora suitable for linguistic and engineering research.

References

Akçay, M. B., & Oğuz, K. (2020). Speech emotion recognition: Emotional models, databases, features, preprocessing methods, supporting modalities, and classifiers, Speech Communication, 116, 56-76.

Arora, S., Batra, K., & Singh, S. (2013). Dialogue system: A brief review, arXiv:1306.4134.

Austin, J. L. (1975). How to do things with words. Oxford university press.

Baker, J. M., Deng, L., Glass, J., Khudanpur, S., Lee, C. H., Morgan, N., & O'Shaughnessy, D. (2009). Developments and directions in speech recognition and understanding, Part 1 [DSP Education], IEEE Signal processing magazine, 26(3), 75-80.

Beckman, M. E., Hirschberg, J. B., & Shattuck-Hufnagel, S. (2004). The original ToBI system and the evolution of the ToBI framework, Prosodic typology: The phonology of intonation and phrasing, (pp. 9-54). Oxford University Press.

Bennett, R. T. (2012). Foot-conditioned phonotactics and prosodic constituency (Unpublished doctoral dissertation). UC Santa Cruz, United States of America.

Berruto, G., & Cerruti, M. S. (2015). Manuale di sociolinguistica. Utet Università.

Boersma, P., & Weenink, D. (1996). Praat, a system for doing phonetics by computer, version 3.4. Institute of Phonetic sciences of the University of Amsterdam, Report, 132, 182.

Booij, G. (1999). The role of the prosodic word in phonotactic generalizations, Amsterdam studies in the theory and history of linguistic science series 4, 47-72. John Benjamins.

Bunt, H. (2009). The DIT++ taxonomy for functional dialogue markup. Eighth International Conference on Autonomous Agents and Multiagent Systems, Towards a Standard Markup Language for Embodied Dialogue Acts Workshop, Bucarest, Romania (AAMAS2009).

Bunt, H., Petukhova, V., Traum, D., & Alexandersson, J. (2017). Dialogue act annotation with the ISO 24617-2 standard. Multimodal interaction with W3C standards (pp. 109-135). Springer.

Büring, D. (2009). Towards a typology of focus realization, Information Structure, ed. by Malte Zimmermann and Caroline Féry (pp. 177-205). Oxford University Press.

Calliope, L., & Fant, G. (1989). La parole et son traitement automatique. Masson.

Carbone, F., & Petrone, C. (2020). L'impact de la prosodie et du lexique des émotions sur l'activité électrodermale en français, Language et émotions.

Cenceschi, S. (2019). Speech analysis for automatic prosody recognition (Unpublished doctoral dissertation). Politecnico di Milano, Italy.

Cenceschi S., Meluzzi C., Nese, N. (2021) Speaker's identification across recording modalities: a preliminary phonetic experiment1, Studi AISV (Vol. 7), in print.

Cenceschi, S., Tedesco, R., Sbattella, L., Losio, D., & Luchetti, M. (2019). PESInet: Automatic Recognition of Italian Statements, Questions, and Exclamations With Neural Networks, Proceedings of the Sixth Italian Conference on Computational Linguistics 2019, Bari, Italy, (CLiC-it19).

Cenceschi, S., Sbattella, L., & Tedesco, R. (2018a). Influence of semantics on the perception of corrective focus in spoken Italian, Proceedings of 9th Tutorial and Research Workshop on Experimental Linguistics, Paris, France (Exling18).

Cenceschi, S., Sbattella, L., & Tedesco, R. (2018b). Towards automatic recognition of prosody, Proceedings of the 9th International Conference on Speech Prosody, Poznań, Poland (SpeechProsody2018).

Cenceschi, S., Sbattella, L., & Tedesco, R. (2018c). Verso il riconoscimento automatico della prosodia, Studi AISV (Vol. 3), 433-440.

Cole, J. (2015). Prosody in context: a review, Language, Cognition and Neuroscience, 30(1-2), 1-31. Taylor & Francis Online.

Coseriu, E. (1980). Historische Sprache" und" Dialekt (pp. 45-61). Franz Steiner Verlag.

Cresti, E. (2000). Corpus di italiano parlato (Vol. 1). Accademia della Crusca.

Cresti, E. (2014). Syntactic properties of spontaneous speech in the Language into Act Theory. Spoken Corpora and Linguistic Studies (pp. 365-410). John Benjamins.

Cresti, E. (2020). The pragmatic analysis of speech and its illocutionary classification according to the Language into Act Theory. S. Izre ́el, H. Mello, A. Panunzi & T. Raso (eds), In search of basic units of spoken language: A corpus-driven approach (pp. 181-219). John Benjamins.

Cresti, E., Martin, P., & Moneglia, M. (1998). L’intonazione delle illocuzioni naturali rappresentative: analisi e validazione percettiva, Atti delle IX giornate del gruppo di fonetica sperimentale, Venice, Italy.

Davis, M. H., & Johnsrude, I. S. (2007). Hearing speech sounds: top-down influences on the interface between audition and speech perception. Hearing research, 229(1-2), 132-147. Elsevier.

De Iacovo, V. (2019). Intonation analysis on some samples of Italian dialects: an instrumental approach (Vol. 3). Edizioni dell'Orso.

D’Imperio, M. (2002). Italian intonation: An overview and some questions. Probus, 14(1), 37-69. De Gruyter Mouton.

Domínguez, L. (2004). Mapping focus: The syntax and prosody of focus in Spanish (Unpublished doctoral dissertation). Boston University, United States of America.

Douglas-Cowie, E., Cowie, R., Sneddon, I., Cox, C., Lowry, O., Mcrorie, M., ... & Karpouzis, K. (2007). The HUMAINE database: Addressing the collection and annotation of naturalistic and induced emotional data, International conference on affective computing and intelligent interaction, Lisbon, Portugal (ACII2007).

Dupriez, B. (2016). A dictionary of literary devices. University of Toronto Press.

Fassone, G., Valcella, F., Pallini, S., Scarcella, F., Tombolini, L., Ivaldi, A., & Liotti, G. (2012). Assessment of Interpersonal Motivation in Transcripts (AIMIT): An inter‐and intra‐rater reliability study of a new method of detection of interpersonal motivational systems in psychotherapy, Clinical psychology & psychotherapy, 19(3), 224-234. Wiley.

Ferrari, G. (2004). State of the art in Computational Linguistics. Linguistics today: Facing a greater challenge, 163-186. John Benjamins.

Firenzuoli, V. (2003). Le forme intonative di valore illocutivo dell'italiano parlato. Analisi sperimentale di un corpus di parlato spontaneo (LABLITA) (Unpublished doctoral dissertation), Università di Firenze, Italy.

Fujisaki, H. (1997). Prosody, models, and spontaneous speech, Computing prosody (pp. 27-42). Springer.

Fujisaki, H. (2004). Information, prosody, and modeling-with emphasis on tonal features of speech, Speech Prosody, Nara, Japan.

Fux, T., Feng, G., & Zimpfer, V. (2011). Relevant acoustic features of speech signals for natural-to-shouted voice transformation, 6th European Congress on Acoustics, Forum Acusticum, Aalborg, Denmark.

Ghaffarzadegan, S., Bořil, H., & Hansen, J. H. (2014). Model and feature based compensation for whispered speech recognition, Fifteenth Annual Conference of the International Speech Communication Association, Singapore (Interspeech2014).

Goldrick, M. (2004). Phonological features and phonotactic constraints in speech production, Journal of Memory and Language, 51(4), 586-603. Elsevier.

Gussenhoven, C. (2008). Types of focus in English, Topic and focus (pp. 83-100). Springer.

Hansen, J. H., Nandwana, M. K., & Shokouhi, N. (2017). Analysis of human scream and its impact on text-independent speaker verification, The Journal of the Acoustical Society of America, 141(4), 2957-2967.

Harris, R. A. (1997). A handbook of rhetorical devices. Retrieved from https://hellesdon.org/documents/Advanced%20Rhetoric.pdf

Haugen, E. (1966). Dialect, Language, Nation 1. American anthropologist, 68(4), 922-935. Wiley.

Hymes, D. (2001). Foundations in sociolinguistics: An ethnographic approach. Psychology Press.

Kazanina, N., Phillips, C., & Idsardi, W. (2006). The influence of meaning on the perception of speech sounds, Proceedings of the National Academy of Sciences, 103(30), 11381-11386.

Klasen, M., von Marschall, C., Isman, G., Zvyagintsev, M., Gur, R. C., & Mathiak, K. (2018). Prosody production networks are modulated by sensory cues and social context, Social cognitive and affective neuroscience, 13(4), 418-429. Oxford University Press.

Kompe, R., & Kompe, R. (1997). Prosody in speech understanding systems (Vol. 1307). Springer.

Kratzer, A. (2012). Modals and conditionals: New and revised perspectives (Vol. 36). Oxford University Press.

Jovičić, S. T. (1998). Formant feature differences between whispered and voiced sustained vowels. Acta Acustica united with Acustica, 84(4), 739-743.

Leoni, F. A. (2001). Il ruolo dell'udito nella comunicazione linguistica. Il caso della prosodia, Italian Journal of Linguistics, 13, 45-68.

Leoni, F. A. (2017). Lingua e patologia: le frontiere interdisciplinari del linguaggio. Aracne.

Leoni, F. A., & Giordano, F. (2005). R. (a cura di), Italiano Parlato. Analisi di un dialogo. Liguori.

Likitha, M. S., Gupta, S. R. R., Hasitha, K., & Raju, A. U. (2017). Speech based human emotion recognition using MFCC, 2017 IEEE International conference on wireless communications, signal processing and networking, Chennai, India.

Liotti, G., & Monticelli, F. (2008). I sistemi motivazionali nel dialogo clinico. Raffaello Cortina.

Llisterri, J. (1992, July). Speaking styles in speech research, ELSNET/ESCA/SALT Workshop on Integrating Speech and Natural Language, Dublin, Irland.

López Zorrilla, A., De Velasco Vázquez, M., Cenceschi, S., & Torres Barañano, M. I. (2018). Corrective focus detection in Italian speech using neural networks, Acta Polytechnica Hungarica, 15(5), 109-127.

Maienborn, C., von Heusinger, K., & Portner, P. (Eds.). (2011). Semantics: An international handbook of natural language meaning (Vol. 33). Walter de Gruyter.

Nencioni, G. (1983). Di scritto e di parlato: discorsi linguistici (Vol. 6). Zanichelli.

Nicholson, J., Takahashi, K., & Nakatsu, R. (2000). Emotion recognition in speech using neural networks, Neural computing & applications, 9(4), 290-296, Springer.

Noth, E., Batliner, A., Kießling, A., Kompe, R., & Niemann, H. (2000). Verbmobil: The use of prosody in the linguistic components of a speech understanding system. IEEE Transactions on Speech and Audio processing, 8(5), 519-532.

Origlia, A., Cutugno, F., & Galatà, V. (2014). Continuous emotion recognition with phonetic syllables. Speech Communication, 57, 155-169. Elsevier.

Plutchik, R. (1991). The emotions. University Press of America.

Prieto, P., Borràs-Comes, J., & Roseano, P. (2010). Interactive atlas of Romance intonation. Web page: http://prosodia. upf. edu/iari.

Romano, A., Contini, M., & Lai, J. P. (2014). L’Atlas Multimédia Prosodique de l’Espace Roman: uno strumento per lo studio della variazione geoprosodica, 20 Jahre digitale Sprachgeographie, Humboldt-Universität - Institut für Romanistik , 27-51.

Sadock, J. M., & Arnold, M. Z. (1985). Speech act distinctions in syntax, in Timothy Shopen (ed.), Language typology and syntactic description, Vol. 1, (pp. 155–196). Cambridge University Press.

Sbattella, L., Tedesco, R., & Trivilini, A. (2014). Forensic examinations: Computational analysis and information extraction, International Conference on Forensic Science-Criminalistics Research, Singapore (FSCR).

Schuller, B., Batliner, A., Steidl, S., & Seppi, D. (2011). Recognising realistic emotions and affect in speech: State of the art and lessons learnt from the first challenge, Speech Communication, 53(9-10), 1062-1087. Elsevier.

Schröder, M., Pirker, H., & Lamolle, M. (2006, May). First suggestions for an emotion annotation and representation language, Proceedings of The International Conference on Language Resources and Evaluation, Genoa, Italy (LREC2006).

Swain, M., Routray, A., & Kabisatpathy, P. (2018). Databases, features and classifiers for speech emotion recognition: a review, International Journal of Speech Technology, 21(1), 93-120. Springer.

Tamburini, F., Bertini, C., & Bertinetto, P. M. (2014). Prosodic prominence detection in Italian continuous speech using probabilistic graphical models, Proceedings of the 7th International Conference on Speech Prosody, Dublin, Germany (SpeechProsody2014).

Tomkins, S. S. (1984). Affect theory. In Klaus R. Scherer, Paul Ekma (Eds.), Approaches to emotion, (pp. 163-195), Psychology Press.

Vasco, V., Gensini, S., & Leoni, F. A. (2010). Tu chiamale se vuoi emozioni”: Espressione e riconoscimento degli stati d’animo nel parlato (Unpublished doctoral dissertation). University La Sapienza, Italy.

Vogt, T. (2010). Real-time automatic emotion recognition from speech (Unpublished doctoral dissertation). Universität Bielefeld, Germany.

Wilks, Y., Catizone, R., Worgan, S., & Turunen, M. (2010). Some background on dialogue management and conversational speech for dialogue systems, Computer Speech and Language, 25(2), 128. Elsevier.

Williams, C. E., & Stevens, K. N. (1972). Emotions and speech: Some acoustical correlates, The Journal of the Acoustical Society of America, 52(4B), 1238-1250.

Zhang, C., & Hansen, J. H. (2007). Analysis and classification of speech mode: whispered through shouted, Eighth Annual Conference of the International Speech Communication Association, Antwerp, Belgium (Interspeech2007).

Zentner, M., Grandjean, D., & Scherer, K. R. (2008). Emotions evoked by the sound of music: Characterization, classification, and measurement. Emotion, 8(4), 494–521. American Psychological Association.

Downloads

Published

2021-08-22

How to Cite

Cenceschi, S., Sbattella, L., & Tedesco, R. (2021). CALLIOPE: A multi-dimensional model for the prosodic characterization of Information Units. Journal of Experimental Phonetics, 30, 227–245. Retrieved from https://revistes.ub.edu/index.php/experimentalphonetics/article/view/44008

Issue

Section

Miscellaneous