Confirmed invited speakers

Antonio Camurri
DIST-University of Genova 

Title: "Toward computational models of empathy and emotional entrainment"
The study of human intended and unintended interpersonal co-ordination is one of the more interesting and challenging topics in the psychological and behavioral sciences. In the last years, co-ordination is receiving an increasing interest from the research community also in collaborative multimodal interfaces, ambient intelligence and social networks. The objective here is to develop more natural and intelligent interfaces, with a focus on non-verbal communication, embodiment, and enaction (Camurri and Frisoli 2006). In this field, as in natural sciences and medicine, the co-ordination phenomenon is better-known as entrainment or synchronisation. There is no general accepted scientific definition of entrainment. Pikovsky et al. (2001) define it as “an adjustment of rhythms of oscillating objects due to their weak interaction”. Entrainment and related phenomena can be studied focusing attention on different kinds of synchronisation (Phase Synchronisation, General Synchronisation, Complete Synchronisation) and with different approaches depending on experimental conditions (e.g., “passive” or “active” experiments) and on physical observables (e.g. physiological data). We focused on gesture with a twofold perspective: gesture as simple physical signals, i.e., gesture as physical movements, and expressive gesture, that is gesture as conveyer of non-verbal emotional content (e.g., Camurri et al. 2004). In this way, we intend to test how non-verbal expressive gestural communication can play a relevant role in entraining people under different perceptual coupling strengths and inducted emotional states. Case studies and experiments will be presented from recent research projects on gesture, emotion, and music (Varni et al 2008).

  • A.Pikovsky, M.G. Rosemblum, and J. Kurths. Synchronization: a Universal Concept in Nonlinear Science. Cambridge University Press, Cambridge, 2001.
  • A.Camurri, B. Mazzarino, G. Volpe (2004) Analysis of Expressive Gesture: The EyesWeb Expressive Gesture Processing Library, in A. Camurri, G. Volpe (Eds.), Gesture-based Communication in Human-Computer Interaction, LNAI 2915, pp.460-467, Springer Verlag.
  • A.Camurri, A.Frisoli (Guest Editors) (2006) Special Issue of Virtual Reality Journal on Multisensory Interaction in Virtual Environments, Vol.10, No.1, Springer.
  • G.Varni, A.Camurri, P.Coletta, G.Volpe (2008) “Emotional Entrainment in Music Performance”, Proc. 8th IEEE Intl Conf on Automatic Face and Gesture Recognition, Sept. 17-19, Amsterdam.

Asli Özyürek
MPI Psycholinguistics & Radboud Univ. Nijmegen, Koc Univ. Istanbul 

: "The role of gesture in production and comprehension of language: Insights from behavior and brain"
Speakers in all cultures and ages use gestures as they speak, even if they are congenitally blind or their gestures are not visible (i.e., on the telephone). For example as somebody says, “she walked to the bus”, this is likely to be accompanied by wiggling fingers crossing space. In this talk I will address the cognitive and neural processes that underlie a) how speakers produce gestures in a way that is temporally and semantically integrated with their speech and b) how listeners/viewers comprehend such gestures in the context of speech. For production I will show that gestures are not produced merely from the spatial imagery of what is depicted but from an in interface representation of imagery and linguistic representation during online speaking. For this I present cross-linguistic comparisons of speech and gestures (Kita & Özyürek, 2003; Özyürek et al, 2005) as well as experiments showing that the speaker's on-line semantic and syntactic formulation a spatial event influences his/her gestural representation (Kita, Özyürek, Allen et al , 2007). Further evidence for this claim will be provided in another study showing that when speakers of different languages enact an event without speaking (i.e., pantomime) their gestures look very similar to each other (Goldin-Meadow, So, Özyürek, Mylander, 2008) but differ as they represent the same event using gestures as they speak (Özyürek & Goldin-Meadow, in prep) - showing an influence of linguistic packaging of the event in their cospeech gestures. Similarly, comprehension studies also seem to indicate interface between the two modalities. Experiments using measurement techniques such as ERP and fMRI show that comprehension of both speech and gestures within a sentence context is supported by similar neural correlates (i.e., ERP component- N400 and Left Inferior Frontal Cortex) (Özyürek, Willems et al, 2007; Willems, Özyürek, Hagoort , 2007). Furthermore listeners/viewers do not seem to process information from both modalities independently or in an additive fashion but processing of each modality influences semantic processing of the other during online comprehension (Kelly, Özyürek, Maris, under review). These studies show overall that processing of semantic information from both modalities interacts during both comprehension and production and provide evidence for the claims that speech and gesture together form an integrated system.


Alex Waibel
Carnegie Mellon Univ., Univ. of Karlsruhe 

Title: "Multimodal interfaces in support of human-human interaction"

After building computers that paid no intention to communicating with humans, the computer science community has devoted significant effort over the years to more sophisticated interfaces that put the "human in the loop" of computers. These interfaces have improved usability by providing more appealing output (graphics, animations), more easy to use input methods (mouse, pointing, clicking, dragging) and more natural interaction modes (speech, vision, gesture, etc.).  Yet all these interaction modes have still mostly been restricted to human-machine interaction and made severely limiting assumptions on sensor setup and expected human behavior.  (For example, a gesture might be presented clearly in front of the camera and have a clear start and end time). Such assumptions, however, are unrealistic and have, consequently, limited the potential productivity gains, as the machine still operates in a passive mode, requiring the user to pay considerable attention to the technological artifact.
As a departure from such classical user interfaces, we have turned our attention to developing user interface for use in computing services that place Computers in the midst of Humans, i.e. in the Human Interaction Loop (CHIL), rather than the other way round.  CHIL services aim to provide assistance implicitly and proactively, while causing minimal interference.  They operate in environments, where humans interact with humans and computers hover in the background providing assistance wherever needed.  Providing such services in real life situations, however, presents formidable technical challenges.  Computers must be made aware of the activities, locations, interactions, and cognitive states of the humans that they are to serve and they must become socially responsive.  Services must be delivered and provided in a private, secure, and socially acceptable manner.
CHIL services require perceptual technology that provides a complete description of human activities and interactions to derive and infer user needs, i.e.,they must describe the WHO, WHERE, HOW, TO WHOM, WHY, WHEN of human interaction and engagement.  Describing human-human interaction in open, natural and unconstrained environments is further complicated by robustness issues, when noise, illumination, occlusion, interference, suboptimal sensor positioning, perspective, localization and segmentation all introduce uncertainty.  Relevant perceptual cues therefore must be gathered, accumulated and fused across modalities and along time opportunistically, i.e., whenever and wherever such cues can be determined and merged reliably.  And finally, gathering of such multimodal cues, should involve a proactive participation of the interface to seek out such cues, as the interface may move (Humanoid Robots), coordinate (multiple sensors), and calibrate its own sensors and data gathering.
In this talk, I will present ongoing work and results from perceptual interfaces we are developing in realistic human-human interaction environments, using data from smart rooms and humanoid robot interaction.