Audiovisual perception of congruent and incongruent Spanish vowels in unimodal and bimodal conditions
Keywords:
audiovisual speech perception, vowels, McGurk effect, lipreadingAbstract
This paper proposes a preliminary approach to the study of the integration of auditory and visual signals in the process of audiovisual speech perception of Spanish vowels. Following the experiment of McGurk and Macdonald (1976), the purpose of this study is to find out whether incongruent cues in audiovisual speech perception can alter the identification of the auditory stimulus, and to analyze whether this bimodal integration process causes a perceptual result other than the one that occurs in each channel separately. These results will allow us to know the extent to which the visual signal affects this class of stimuli and conditions, and determine, on the other hand, if the findings reflect differences in sensibilities in regards to the visual channel depending upon the speaker’s gender. For this purpose, 28 subjects (12 men and 16 women) had to identify the five vowels of Spanish distributed at random into three blocks: a) 25 audiovisual cross combinations between all vowels (thus in congruent and incongruent audiovisual conditions); b) 10 visual stimuli distributed in two series (5 x 2); and c) 5 auditory stimuli. Along with this, the participants had to indicate the degree of confidence of their response. Results showed, among other findings, that the presence in this kind of incongruent visual stimuli affects the perception of the auditory signal and results in perceptual fusions. It also demonstrates that visual information alone is not sufficient for discrimination, although important differences can be found depending on the presence of visual cues, such as rounding or vowel openness. There are also significant differences in visual perception and the variability of responses between men and women.
References
ABELIN, Å. (2007): «Emotional McGurk effect in Swedish», en L. Berthouze, C. G. Prince, M. Littman, H. Kozima y C. Balkenius (eds.): Proceedings of the Seventh International Conference on Epigenetic Robotics: Modeling Cognitive Development in Robotic Systems, Lund University Cognitive Studies, 135, pp. 73-76.
ALOUFY, S.; M. LAPIDOT y M. MYSLOBODSKY (1996): «Differences in susceptibility to the "blending illusion" among native Hebrew and English speakers», Brain and Language, 53, pp. 51-57.
ARGYLE, M. y R. INGHAM (1972): «Gaze, mutual gaze, and proximity», Semiotica, 6, pp. 32-49.
ASSMANN, P. F. y A. Q. SUMMERFIELD (2004): «The perception of speech under adverse conditions», en S. Greenberg, W. A. Ainsworth, A. N. Popper y R. R. Fay (eds.): Processing in the auditory system, Heidelberg, Springer-Verlag, pp. 231-308.
BAYLISS, A. P.; G. DI PELLEGRINO y S. P. TIPPER (2005): «Sex differences in eye gaze and symbolic cueing of attention», The Quarterly Journal of Experimental Psychology, 58A (4), pp. 631–650.
CALVERT, G. A.; E. T. BULLMORE, M. J. BRAMMER, R. CAMPBELL, S. C. R. WILLIAMS, P. K. MCGUIRE, P. W. R. WOODRUFF, S. D. IVERSEN y A. S. DAVID (1997): «Activation of Auditory Cortex During Silent Lipreading», Science, 276, pp. 593-596.
CAMPBELL, R. (1994): «Audiovisual Speech: Where, what, when, how?», Current Psychology of Cognition, 13, pp. 76-80.
CHEN, Y. y V. HAZAN (2007): «Language effects on the degree of visual influence in audiovisual perception», en J. Trouvain y W. J. Barry (eds.): Proceedings of the 16th International Congress of the Phonetic Sciences, Saarbrücken, Universidad de Sarland, pp. 2177-2180.
COLIN, C.; M. RADEAU y P. DELTENRE (2005): «Top-down and bottom-up modulation of audiovisual integration in speech», European Journal of Cognitive Psychology, 17 (4), pp. 541- 560.
DODD, B. (1977): «The role of vision in the perception of speech», Perception, 6, pp. 31-40.
DODD, B. (1979): «Lip-reading in infants: attention to speech presented in- and out-of-synchrony», Cognitive Psychology, 11 (4), pp. 478-84.
DARWIN, C. J. y R. P. CARLYON (1995): «Auditory grouping», en B. C. J. Moore (ed.): The handbook of perception and cognition, Londres, Academic Press, pp. 387-424.
ERBER, N. P. (1969): «Interaction of audition and vision in the recognition of oral speech stimuli», Journal of Speech and Hearing Research, 12, pp. 423-425.
FOWLER, C. A. y D. J. DEKLE (1991): «Listening with eye and hand: Cross-Modal contributions to speech perception», Journal of Experimental Psychology: Human Perception and Performance, 17 (3), pp. 816-828.
GAIL S. D.; K. T. ELIZABETH y L. R. CATHERINE (2010): «Vowel identification by younger and older listeners: Relative effectiveness of vowel edges and vowel centers», The Journal of Acoustical Society of America, 128 (3), pp. 105-110.
GICK, B. y D. DERRICK (2009): «Aero-tactile integration in speech perception», Nature, 462, pp. 502-504.
GIL FERNÁNDEZ, J. (2007): Fonética para profesores de español: de la teoría a la práctica, Madrid, Arco Libros.
GIRIN L.; J. L. SCHWARTZ y G. FENG (2001): «Audio-Visual Enhancement of Speech in Noise», The Journal of Acoustical Society of America, 109 (6), pp. 3007-3020.
GREEN, K. P. y A. GERDEMAN (1995): «Cross-modal discrepancies in coarticulation and the integration of speech information: the McGurk effect with mismatched vowels», Journal of Experimental Psychology: Human Perception and Performance, 21 (6), pp. 1409-1426.
GREEN, K. P. y P. K. KUHL (1989): «The role of visual information in the processing of place and manner features in speech perception», Perception & Psychophysics, 45 (1), pp. 34-42.
GREEN, K. P.; P. K. KUHL, A. N. MELTZOFF y E. B. STEVENS (1991): «Integrating speech information across talkers, gender, and sensory modality: Female faces and male voices in the McGurk effect», Perception & Psychophysics, 50 (6), pp. 524-536.
GREEN, K. P. y L. W. NORRIX (1997): «Acoustic cues to place of articulation and the McGurk Effect: The role of release bursts, aspiration, and formant transitions», Journal of Speech, Language, and Hearing Research, 40 (3), pp. 646-665.
IRWIN, J. R.; D. H. WHALEN y C. A. FOWLER (2006): «A sex difference in visual influence on heard speech», Perception & Psychophysics, 68 (4), pp. 582-592.
JENKINS, J. J.; W. STRANGE y T. R. EDMAN (1983): «Identification of vowels in ‘vowelless’ syllables», Perception & Psychophysics, 34 (5), pp. 441-450.
JOHNSON, F. M.; L. HICKS, T. GOLDBERG y M. MYSLOBODSKY (1988): «Sex differences in Lipreading», Bulletin of the Psychonomic Society, 26 (2), pp. 106-108.
JONES, D. (1917): An English Pronouncing Dictionary, Londres, Dent.
KANZAKI, R. y R. CAMPBELL (1999): «Effect of facial brightness reversal on visual and audiovisual speech perception», en D. Massaro (ed.): Audio Visual Speech Processing International Conference, University of California, Santa Cruz.
KEWLEY-PORT, D.; Z. T. BURKLE y J. H. LEE (2007): «Contribution of consonant versus vowel information to sentence intelligibility for young normal-hearing and elderly hearing impaired listeners», The Journal of the Acoustical Society of America, 122 (4), pp. 2365-2375.
KIM, J. y C. DAVIS (2003): «Hearing foreign voices: does knowing what is said affect masked visual speech detection?», Perception, 32 (1), pp. 111–120.
KUHL P. K. y A. MELTZOFF (1982): «The bimodal perception of speech in infancy», Science, 218, pp. 1138-1141.
LADEFOGED, P. (2001): Vowels and consonants: an introduction to the sounds of languages, Malden, Blackwell Publishers Ltd.
LADEFOGED, P. y K. JOHNSON (1975): A course in phonetics, Wadsworth, Cengage Learning, 20066.
LADEFOGED, P. e I. MADDIESON (1996): The sounds of the world's languages, Oxford, Blackwell Publishers Ltd.
LEEB, R. T. y F. G. REJSKIND (2004): «Here’s looking at you, kid! A longitudinal study of perceived gender differences in mutual gaze behavior in young infants», Sex Roles, 50 (1), pp. 1–14.
LISKER, L. y M. ROSSI (1992): «Auditory and visual cueing of the [+/- rounded] feature of vowels», Language and Speech, 35 (4), pp. 391-417.
MACDONALD, J. (2006): «Hearing Lips and Seeing Voices: Illusion and Serendi-pity in Auditory-Visual Perception Research», en J. Atkinson y M. Crove (eds.): Interdisciplinary Research: Diverse Approaches in Science, Techno-logy, Health and Society, John Wiley & Sons, Chichester, pp. 101-115.
MACDONALD, J.; S. ANDERSEN y T. BACHMAN (2000): «Hearing by eye: how much spatial degradation can be tolerated?», Perception, 29 (10), pp. 1155-1168.
MACLEOD, A. y Q. SUMMERFIELD (1987): «Quantifying the contribution of vision to speech perception in noise», British Journal of Audiology, 21 (2), pp. 131-141.
MARTÍNEZ CELDRÁN, E. y A. MA. FERNÁNDEZ PLANAS (2007): Manual de fonética española: articulaciones y sonidos del español, Barcelona, Ariel Lingüística.
MASSARO, D. W. (1989): «Testing between the TRACE Model and the Fuzzy Logical Model of Speech perception», Cognitive Psychology, 21 (3), pp. 398–421.
MASSARO, D. W. (1998): Perceiving talking faces: From speech perception to a behavioral principle, Cambridge, Massachusetts, MIT Press.
MASSARO, D. W. y M. M. COHEN (1990): «Perception of synthesized audible and visible speech», Psychological Science, 1 (1), pp. 55-63.
MASSARO, D. W. y M. M. COHEN (1996): «Perceiving speech from inverted faces», Perception and Psychophysics, 58 (7), pp. 1047-1065.
MASSARO, D. W.; L. A. THOMPSON, B. E. BARRON y E. LAREN (1986): «Developmental changes in visual and auditory contributions to speech perception», Journal of Experimental Child Psychology, 41 (1), pp. 93-113.
MCGURK, H. e I. MACDONALD (1976): «Hearing lips and seeing voices», Nature, 264, pp. 746-748.
MORAIN, G. G. (2001): «Kinesics and cross-cultural understanding», en J. M. Valdes (ed.): Culture Bound, Cambridge University Press, pp. 64-76.
MUNHALL, K. G.; P. GRIBBLE, L. SACCO y M. WARD (1996): «Temporal constraints on the McGurk effect», Perception and Psychophysics, 58 (3), pp. 351-362.
MUNHALL, K. G. y Y. TOKHURA (1998): «Audiovisual gating and the time course of speech perception», The Journal of the Acoustical Society of America, 104 (1), pp. 530–539.
MURASE, M.; D. N. SAITO, T. KOCHIYAMA, H. C. TANABE, S. TANAKA; T. HARADA, Y. ARAMAKI, M. HONDA y N. SADATO (2008): «Cross-modal integration during vowel identification in audiovisual speech: a functional magnetic resonance imaging study», Neuroscience letters, 434 (1), pp. 71-76.
NIELSEN, K. (2004): «Segmental differences in the visual contribution to speech intelligibility», UCLA Working Papers in Phonetics, 103, pp. 106-147.
O’SHEA, M. (2005): The Brain: A Very Short Introduction, Oxford University Press.
OSTRAND, R.; SH. BLUMSTEIN y J. MORGAN (2011): «When hearing lips and seeing voices becomes perceiving speech: auditory-visual integration in lexical access», en L. Carlson, C. Hölscher y T. Shipley (eds.): Proceedings of the 33rd Annual Conference of the Cognitive Science Society, Austin, Cognitive Science Society, pp. 1376-1381.
PARÉ, M.; C. RICHLER, M. HOVE y K. MUNHALL (2003): «Gaze behavior in audiovisual speech perception: The influence on ocular fixations on the McGurk effect», Perception and Psychophysics, 65 (4), pp. 533-567.
RAHMAWATI, S. y M. OHGISHI (2011): «Cross cultural studies on audiovisual speech processing: the McGurk effects observed in consonant and vowel perception», en T. Juhana, A. Munir Iskandar y N. Rachmana Hendrawan (eds.): Proceedings of the 6th International Conference on Tele-communication Systems, Services, and Applications, TSSA 2011, pp. 59-63.
REISBERG, D.; J. MCLEAN y A. GOLDFIELD (1987): «Easy to hear but hard to understand: A lipreading advantage with intact auditory stimuli», en B. Dodd y R. Campbell (eds.): Hearing by Eye: The Psychology of Lip-reading, Lawrence Erlbaum Associates Ltd, pp. 97-113.
RICHARDSON, A. C. (2010): «Effect of Visual Input on Vowel Production in English Speakers», Linguistics Honors Projects, Paper 5.
http://digitalcommons.macalester.edu/ling_honors/5 [22/09/2013]
RIZZOLATTI, G. y L. CRAIGHERO (2004). «The Mirror-Neuron System», Annual Review of Neuroscience, 27, pp. 169-192.
ROBERT-RIBES, J.; J.-L. SCHWARTZ, T. LALLOUACHE y P. ESCUDIER (1998): «Complementary and synergy in bimodal speech: Auditory, visual, and audio-visual identification of French oral vowels in noise», The Journal of Acoustical Society of America, 103 (6), pp. 3677-3689.
ROSENBLUM, L. D. y H. M. SALDAÑA (1996): «An audiovisual test of kinematic primitives for visual speech perception», Journal of Experimental Psychology: Human Perception and Performance, 22 (2), pp. 318-331.
ROSENBLUM, L. D.; M. A. SCHMUCKLER y J. A. JOHNSON (1997): «The McGurk effect in infants», Attention, Perception & Psychophysics, 59 (3), pp. 347-357.
ROSS, L. A.; D. SAINT-AMOUR, V. M. LEAVITT, D. C. JAVITT y J. J. FOXE (2007): «Do you see what I am saying? Exploring visual enhancement of speech comprehension in noisy environments», Cerebral Cortex, 17 (5), pp. 1147–1153.
ROUGER, J.; B. FRAYSSE, O. DEGUINE y P. BARONE (2008): «McGurk effects in cochlear-implanted deaf subjects», Brain Research, 1188, pp. 87–99.
SAMS, M.; R. AULANKO, M. HAMALAINEN, R. HARI, O. V. LOUNASMAA, S.-T. LU y J. SIMOLA (1991): «Seeing speech: visual information from lip movements modifies activity in the human auditory cortex», Neuroscience Letters, 127, pp. 141-145.
SCHEFFERS, M. T. M. (1983): Sifting vowels. Auditory pitch analysis and sound segregation, tesis doctoral, Universidad de Groningen, Países Bajos.
SEKIYAMA, K. y Y. TOHKURA (1991): «McGurk effect in non-English listeners: Few visual effects for Japanese subjects hearing Japanese syllables of high auditory intelligibility», The Journal of Acoustical Society of America, 90 (4), pp. 1797-1805.
SEKIYAMA, K. y Y. TOHKURA (1993): «Inter-language differences in the influence of visual cues in speech perception», Journal of Phonetics, 21 (4), pp. 427-444.
SEKIYAMA, K.; D. BURNHAM, H. TAM y D. ERDENER (2003): «Auditory-Visual Speech Perception Development in Japanese and English Speakers», en J.-L. Schwartz, F. Berthommier, M.-A. Cathiard y D. Sodoyer (eds.): Proceedings of the International Conference on Auditory-Visual Speech Processing, St. Jorioz, pp. 61-66.
SKIPPER, J. I.; V. VAN WASSENHOVE, H. C. NUSBAUM y S. L. SMALL (2007): «Hearing lips and seeing voices: How cortical areas supporting speech production mediate audiovisual speech perception», Cerebral Cortex, 17 (10), pp. 2387-2399.
STRANGE, W.; J. J. JENKINS y T. L. JOHNSON (1983): «Dynamic specification of coarticulated vowels», The Journal of Acoustical Society of America, 74 (3), pp. 695-705.
STRANGE, W.; R. R. VERBRUGGE, D. P. SHANKWEILER y T. R. EDMAN (1976): «Consonant environment specifies vowel identity», The Journal of the Acoustical Society of America, 60 (1), pp. 213-224.
SUMBY, W. H. e I. POLLACK (1954): «Visual contribution to speech intelligibility in noise», The Journal of the Acoustical Society of America, 26 (2), pp. 212-215.
SUMMERFIELD, Q. y M. MCGRATH (1984): «Detection and Resolution of Audio-visual Incompatibility in the Perception of Vowels», Quarterly Journal of Experimental Psychology, 36A, pp. 51-74.
TRAUNMÜLLER H. y N. ÖHRSTRÖM (2007): «Audiovisual perception of openness and lip rounding in front vowels», Journal of Phonetics, 35 (2), pp. 244-258.
TSEVA, R. (1989): «L'arrondissement dans l'identification visuelle des voyelles du français», Bulletin du Laboratoire de la Communication Parlée, 3, pp. 149-186.
VALKENIER, B.; J. Y. DUYNE, T. C. ANDRINGA y D. BAŞKEN (2012): «Audiovisual Perception of Congruent and Incongruent Dutch Front Vowels», Journal of Speech, Language, and Hearing Research, 55 (6), pp. 1788-1801.
VAN WASSENHOVE, V.; K. W. GRANT y D. POEPPEL (2007): «Temporal window of integration in auditory-visual speech perception», Neuropsychologia, 45 (3), pp. 598-607.
VELASCO, I.; C. SPENCE y J. NAVARRA (2011): «El sistema perceptivo: esa pequeña máquina del tiempo», Anales de Psicología, 27 (1), pp. 195-201.
WALDEN, B. E.; R. A. PROSEK, A. A. MONTGOMERY, C. K. SCHERR y C. J. JONES (1977): «Effect of training on the visual recognition of consonants», Journal of Speech and Hearing Research, 20 (1), pp. 130-145.
WALKER, S.; V. BRUCE y C. O’MALLEY (1995): «Facial identity and facial speech processing: Familiar faces and voices in the McGurk effect», Perception & Psychophysics, 57 (8), pp. 1124-1133.
YONOVITZ, A.; J. T. LOZAR, C. THOMPSON, D. R. FERRELL y M. ROSS (1977): «Fox‐box illusion»: Simultaneous presentation of conflicting auditory and visual CV's», The Journal of the Acoustical Society of America, 62 (S1), S3.
Downloads
Published
How to Cite
Issue
Section
License
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
All articles published online by Estudios de Fonética Experimental are licensed under Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International (CC BY-NC-ND 4.0 DEED), unless otherwise noted. Estudios de Fonética Experimental is an open access journal. Estudios de Fonética Experimental is hosted by RCUB (Revistes Científiques de la Universitat de Barcelona), powered by Open Journal Systems (OJS) software. The copyright is not transferred to the journal: authors hold the copyright and publishing rights without restrictions. The author is free to use and distribute pre and post-prints versions of his/her article. However, preprint versions are regarded as a work-in-progress version used as internal communication with the authors, and we prefer to share postprint versions.