CALLIOPE
A multi-dimensional model for the prosodic characterization of Information Units
DOI:
https://doi.org/10.1344/efe-2021-30-227-245Keywords:
Prosody, Prosodic model, Information unit, CalliopeAbstract
CALLIOPE is a conceptual multi-dimensional model that aims at approximating and categorizing the prosodic phenomena taking into account of all possible independent factors affecting the sound of so-called Information Units (IUs). In CALLIOPE, each IU is associated with a tuple composed of 12 labels, each belonging to a different dimension representing a characteristic influencing the prosodic behaviour. Its ultimate aim is creating well-defined corpora suitable for linguistic and engineering research.
References
Akçay, M. B., & Oğuz, K. (2020). Speech emotion recognition: Emotional models, databases, features, preprocessing methods, supporting modalities, and classifiers, Speech Communication, 116, 56-76. https://doi.org/10.1016/j.specom.2019.12.001
Arora, S., Batra, K., & Singh, S. (2013). Dialogue system: A brief review, arXiv:1306.4134.
Austin, J. L. (1975). How to do things with words. Oxford university press. https://doi.org/10.1093/acprof:oso/9780198245537.001.0001
Baker, J. M., Deng, L., Glass, J., Khudanpur, S., Lee, C. H., Morgan, N., & O'Shaughnessy, D. (2009). Developments and directions in speech recognition and understanding, Part 1 [DSP Education], IEEE Signal processing magazine, 26(3), 75-80. https://doi.org/10.1109/MSP.2009.932166
Beckman, M. E., Hirschberg, J. B., & Shattuck-Hufnagel, S. (2004). The original ToBI system and the evolution of the ToBI framework, Prosodic typology: The phonology of intonation and phrasing, (pp. 9-54). Oxford University Press. https://doi.org/10.1093/acprof:oso/9780199249633.003.0002
Bennett, R. T. (2012). Foot-conditioned phonotactics and prosodic constituency (Unpublished doctoral dissertation). UC Santa Cruz, United States of America.
Berruto, G., & Cerruti, M. S. (2015). Manuale di sociolinguistica. Utet Università.
Boersma, P., & Weenink, D. (1996). Praat, a system for doing phonetics by computer, version 3.4. Institute of Phonetic sciences of the University of Amsterdam, Report, 132, 182.
Booij, G. (1999). The role of the prosodic word in phonotactic generalizations, Amsterdam studies in the theory and history of linguistic science series 4, 47-72. John Benjamins. https://doi.org/10.1075/cilt.174.04boo
Bunt, H. (2009). The DIT++ taxonomy for functional dialogue markup. Eighth International Conference on Autonomous Agents and Multiagent Systems, Towards a Standard Markup Language for Embodied Dialogue Acts Workshop, Bucarest, Romania (AAMAS2009).
Bunt, H., Petukhova, V., Traum, D., & Alexandersson, J. (2017). Dialogue act annotation with the ISO 24617-2 standard. Multimodal interaction with W3C standards (pp. 109-135). Springer. https://doi.org/10.1007/978-3-319-42816-1_6
Büring, D. (2009). Towards a typology of focus realization, Information Structure, ed. by Malte Zimmermann and Caroline Féry (pp. 177-205). Oxford University Press. https://doi.org/10.1093/acprof:oso/9780199570959.003.0008
Calliope, L., & Fant, G. (1989). La parole et son traitement automatique. Masson.
Carbone, F., & Petrone, C. (2020). L'impact de la prosodie et du lexique des émotions sur l'activité électrodermale en français, Language et émotions.
Cenceschi, S. (2019). Speech analysis for automatic prosody recognition (Unpublished doctoral dissertation). Politecnico di Milano, Italy.
Cenceschi S., Meluzzi C., Nese, N. (2021) Speaker's identification across recording modalities: a preliminary phonetic experiment1, Studi AISV (Vol. 7), in print.
Cenceschi, S., Tedesco, R., Sbattella, L., Losio, D., & Luchetti, M. (2019). PESInet: Automatic Recognition of Italian Statements, Questions, and Exclamations With Neural Networks, Proceedings of the Sixth Italian Conference on Computational Linguistics 2019, Bari, Italy, (CLiC-it19).
Cenceschi, S., Sbattella, L., & Tedesco, R. (2018a). Influence of semantics on the perception of corrective focus in spoken Italian, Proceedings of 9th Tutorial and Research Workshop on Experimental Linguistics, Paris, France (Exling18). https://doi.org/10.36505/ExLing-2018/09/0006/000339
Cenceschi, S., Sbattella, L., & Tedesco, R. (2018b). Towards automatic recognition of prosody, Proceedings of the 9th International Conference on Speech Prosody, Poznań, Poland (SpeechProsody2018). https://doi.org/10.21437/SpeechProsody.2018-65
Cenceschi, S., Sbattella, L., & Tedesco, R. (2018c). Verso il riconoscimento automatico della prosodia, Studi AISV (Vol. 3), 433-440.
Cole, J. (2015). Prosody in context: a review, Language, Cognition and Neuroscience, 30(1-2), 1-31. Taylor & Francis Online. https://doi.org/10.1080/23273798.2014.963130
Coseriu, E. (1980). Historische Sprache" und" Dialekt (pp. 45-61). Franz Steiner Verlag.
Cresti, E. (2000). Corpus di italiano parlato (Vol. 1). Accademia della Crusca.
Cresti, E. (2014). Syntactic properties of spontaneous speech in the Language into Act Theory. Spoken Corpora and Linguistic Studies (pp. 365-410). John Benjamins. https://doi.org/10.1075/scl.61.13cre
Cresti, E. (2020). The pragmatic analysis of speech and its illocutionary classification according to the Language into Act Theory. S. Izre ́el, H. Mello, A. Panunzi & T. Raso (eds), In search of basic units of spoken language: A corpus-driven approach (pp. 181-219). John Benjamins. https://doi.org/10.1075/scl.94.06cre
Cresti, E., Martin, P., & Moneglia, M. (1998). L'intonazione delle illocuzioni naturali rappresentative: analisi e validazione percettiva, Atti delle IX giornate del gruppo di fonetica sperimentale, Venice, Italy.
Davis, M. H., & Johnsrude, I. S. (2007). Hearing speech sounds: top-down influences on the interface between audition and speech perception. Hearing research, 229(1-2), 132-147. Elsevier. https://doi.org/10.1016/j.heares.2007.01.014
De Iacovo, V. (2019). Intonation analysis on some samples of Italian dialects: an instrumental approach (Vol. 3). Edizioni dell'Orso.
D'Imperio, M. (2002). Italian intonation: An overview and some questions. Probus, 14(1), 37-69. De Gruyter Mouton. https://doi.org/10.1515/prbs.2002.005
Domínguez, L. (2004). Mapping focus: The syntax and prosody of focus in Spanish (Unpublished doctoral dissertation). Boston University, United States of America.
Douglas-Cowie, E., Cowie, R., Sneddon, I., Cox, C., Lowry, O., Mcrorie, M., ... & Karpouzis, K. (2007). The HUMAINE database: Addressing the collection and annotation of naturalistic and induced emotional data, International conference on affective computing and intelligent interaction, Lisbon, Portugal (ACII2007).
Dupriez, B. (2016). A dictionary of literary devices. University of Toronto Press.
Fassone, G., Valcella, F., Pallini, S., Scarcella, F., Tombolini, L., Ivaldi, A., & Liotti, G. (2012). Assessment of Interpersonal Motivation in Transcripts (AIMIT): An inter‐and intra‐rater reliability study of a new method of detection of interpersonal motivational systems in psychotherapy, Clinical psychology & psychotherapy, 19(3), 224-234. Wiley. https://doi.org/10.1002/cpp.742
Ferrari, G. (2004). State of the art in Computational Linguistics. Linguistics today: Facing a greater challenge, 163-186. John Benjamins. https://doi.org/10.1075/z.126.09fer
Firenzuoli, V. (2003). Le forme intonative di valore illocutivo dell'italiano parlato. Analisi sperimentale di un corpus di parlato spontaneo (LABLITA) (Unpublished doctoral dissertation), Università di Firenze, Italy.
Fujisaki, H. (1997). Prosody, models, and spontaneous speech, Computing prosody (pp. 27-42). Springer. https://doi.org/10.1007/978-1-4612-2258-3_3
Fujisaki, H. (2004). Information, prosody, and modeling-with emphasis on tonal features of speech, Speech Prosody, Nara, Japan. https://doi.org/10.21437/SpeechProsody.2004-1
Fux, T., Feng, G., & Zimpfer, V. (2011). Relevant acoustic features of speech signals for natural-to-shouted voice transformation, 6th European Congress on Acoustics, Forum Acusticum, Aalborg, Denmark.
Ghaffarzadegan, S., Bořil, H., & Hansen, J. H. (2014). Model and feature based compensation for whispered speech recognition, Fifteenth Annual Conference of the International Speech Communication Association, Singapore (Interspeech2014). https://doi.org/10.21437/Interspeech.2014-232
Goldrick, M. (2004). Phonological features and phonotactic constraints in speech production, Journal of Memory and Language, 51(4), 586-603. Elsevier. https://doi.org/10.1016/j.jml.2004.07.004
Gussenhoven, C. (2008). Types of focus in English, Topic and focus (pp. 83-100). Springer. https://doi.org/10.1007/978-1-4020-4796-1_5
Hansen, J. H., Nandwana, M. K., & Shokouhi, N. (2017). Analysis of human scream and its impact on text-independent speaker verification, The Journal of the Acoustical Society of America, 141(4), 2957-2967. https://doi.org/10.1121/1.4979337
Harris, R. A. (1997). A handbook of rhetorical devices. Retrieved from https://hellesdon.org/documents/Advanced%20Rhetoric.pdf
Haugen, E. (1966). Dialect, Language, Nation 1. American anthropologist, 68(4), 922-935. Wiley. https://doi.org/10.1525/aa.1966.68.4.02a00040
Hymes, D. (2001). Foundations in sociolinguistics: An ethnographic approach. Psychology Press.
Kazanina, N., Phillips, C., & Idsardi, W. (2006). The influence of meaning on the perception of speech sounds, Proceedings of the National Academy of Sciences, 103(30), 11381-11386. https://doi.org/10.1073/pnas.0604821103
Klasen, M., von Marschall, C., Isman, G., Zvyagintsev, M., Gur, R. C., & Mathiak, K. (2018). Prosody production networks are modulated by sensory cues and social context, Social cognitive and affective neuroscience, 13(4), 418-429. Oxford University Press. https://doi.org/10.1093/scan/nsy015
Kompe, R., & Kompe, R. (1997). Prosody in speech understanding systems (Vol. 1307). Springer. https://doi.org/10.1007/3-540-63580-7
Kratzer, A. (2012). Modals and conditionals: New and revised perspectives (Vol. 36). Oxford University Press. https://doi.org/10.1093/acprof:oso/9780199234684.001.0001
Jovičić, S. T. (1998). Formant feature differences between whispered and voiced sustained vowels. Acta Acustica united with Acustica, 84(4), 739-743.
Leoni, F. A. (2001). Il ruolo dell'udito nella comunicazione linguistica. Il caso della prosodia, Italian Journal of Linguistics, 13, 45-68.
Leoni, F. A. (2017). Lingua e patologia: le frontiere interdisciplinari del linguaggio. Aracne.
Leoni, F. A., & Giordano, F. (2005). R. (a cura di), Italiano Parlato. Analisi di un dialogo. Liguori.
Likitha, M. S., Gupta, S. R. R., Hasitha, K., & Raju, A. U. (2017). Speech based human emotion recognition using MFCC, 2017 IEEE International conference on wireless communications, signal processing and networking, Chennai, India. https://doi.org/10.1109/WiSPNET.2017.8300161
Liotti, G., & Monticelli, F. (2008). I sistemi motivazionali nel dialogo clinico. Raffaello Cortina.
Llisterri, J. (1992, July). Speaking styles in speech research, ELSNET/ESCA/SALT Workshop on Integrating Speech and Natural Language, Dublin, Irland.
López Zorrilla, A., De Velasco Vázquez, M., Cenceschi, S., & Torres Barañano, M. I. (2018). Corrective focus detection in Italian speech using neural networks, Acta Polytechnica Hungarica, 15(5), 109-127. https://doi.org/10.12700/APH.15.5.2018.5.7
Maienborn, C., von Heusinger, K., & Portner, P. (Eds.). (2011). Semantics: An international handbook of natural language meaning (Vol. 33). Walter de Gruyter. https://doi.org/10.1515/9783110226614
Nencioni, G. (1983). Di scritto e di parlato: discorsi linguistici (Vol. 6). Zanichelli.
Nicholson, J., Takahashi, K., & Nakatsu, R. (2000). Emotion recognition in speech using neural networks, Neural computing & applications, 9(4), 290-296, Springer. https://doi.org/10.1007/s005210070006
Noth, E., Batliner, A., Kießling, A., Kompe, R., & Niemann, H. (2000). Verbmobil: The use of prosody in the linguistic components of a speech understanding system. IEEE Transactions on Speech and Audio processing, 8(5), 519-532. https://doi.org/10.1109/89.861370
Origlia, A., Cutugno, F., & Galatà, V. (2014). Continuous emotion recognition with phonetic syllables. Speech Communication, 57, 155-169. Elsevier. https://doi.org/10.1016/j.specom.2013.09.012
Plutchik, R. (1991). The emotions. University Press of America.
Prieto, P., Borràs-Comes, J., & Roseano, P. (2010). Interactive atlas of Romance intonation. Web page: http://prosodia. upf. edu/iari.
Romano, A., Contini, M., & Lai, J. P. (2014). L'Atlas Multimédia Prosodique de l'Espace Roman: uno strumento per lo studio della variazione geoprosodica, 20 Jahre digitale Sprachgeographie, Humboldt-Universität - Institut für Romanistik , 27-51.
Sadock, J. M., & Arnold, M. Z. (1985). Speech act distinctions in syntax, in Timothy Shopen (ed.), Language typology and syntactic description, Vol. 1, (pp. 155-196). Cambridge University Press.
Sbattella, L., Tedesco, R., & Trivilini, A. (2014). Forensic examinations: Computational analysis and information extraction, International Conference on Forensic Science-Criminalistics Research, Singapore (FSCR). https://doi.org/10.1037/e577482014-006
Schuller, B., Batliner, A., Steidl, S., & Seppi, D. (2011). Recognising realistic emotions and affect in speech: State of the art and lessons learnt from the first challenge, Speech Communication, 53(9-10), 1062-1087. Elsevier. https://doi.org/10.1016/j.specom.2011.01.011
Schröder, M., Pirker, H., & Lamolle, M. (2006, May). First suggestions for an emotion annotation and representation language, Proceedings of The International Conference on Language Resources and Evaluation, Genoa, Italy (LREC2006).
Swain, M., Routray, A., & Kabisatpathy, P. (2018). Databases, features and classifiers for speech emotion recognition: a review, International Journal of Speech Technology, 21(1), 93-120. Springer. https://doi.org/10.1007/s10772-018-9491-z
Tamburini, F., Bertini, C., & Bertinetto, P. M. (2014). Prosodic prominence detection in Italian continuous speech using probabilistic graphical models, Proceedings of the 7th International Conference on Speech Prosody, Dublin, Germany (SpeechProsody2014). https://doi.org/10.21437/SpeechProsody.2014-45
Tomkins, S. S. (1984). Affect theory. In Klaus R. Scherer, Paul Ekma (Eds.), Approaches to emotion, (pp. 163-195), Psychology Press.
Vasco, V., Gensini, S., & Leoni, F. A. (2010). Tu chiamale se vuoi emozioni": Espressione e riconoscimento degli stati d'animo nel parlato (Unpublished doctoral dissertation). University La Sapienza, Italy.
Vogt, T. (2010). Real-time automatic emotion recognition from speech (Unpublished doctoral dissertation). Universität Bielefeld, Germany.
Wilks, Y., Catizone, R., Worgan, S., & Turunen, M. (2010). Some background on dialogue management and conversational speech for dialogue systems, Computer Speech and Language, 25(2), 128. Elsevier. https://doi.org/10.1016/j.csl.2010.03.001
Williams, C. E., & Stevens, K. N. (1972). Emotions and speech: Some acoustical correlates, The Journal of the Acoustical Society of America, 52(4B), 1238-1250. https://doi.org/10.1121/1.1913238
Zhang, C., & Hansen, J. H. (2007). Analysis and classification of speech mode: whispered through shouted, Eighth Annual Conference of the International Speech Communication Association, Antwerp, Belgium (Interspeech2007). https://doi.org/10.21437/Interspeech.2007-621
Zentner, M., Grandjean, D., & Scherer, K. R. (2008). Emotions evoked by the sound of music: Characterization, classification, and measurement. Emotion, 8(4), 494-521. American Psychological Association. https://doi.org/10.1037/1528-3542.8.4.494
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2021 Sonia Cenceschi, Licia Sbattella, Roberto Tedesco

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
All articles published online by Estudios de Fonética Experimental are licensed under Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International (CC BY-NC-ND 4.0 DEED), unless otherwise noted. Estudios de Fonética Experimental is an open access journal. Estudios de Fonética Experimental is hosted by RCUB (Revistes Científiques de la Universitat de Barcelona), powered by Open Journal Systems (OJS) software. The copyright is not transferred to the journal: authors hold the copyright and publishing rights without restrictions. The author is free to use and distribute pre and post-prints versions of his/her article. However, preprint versions are regarded as a work-in-progress version used as internal communication with the authors, and we prefer to share postprint versions.