Universität Bielefeld - Technische Fakultät - AG Wissensbasierte Systeme - Alignment in Communication (SFB 673)

Humans intuitively combine language with spontaneous gesture to form multimodal utterances. In such utterances, words and gestures appear highly coordinated and closely intertwined - in other words, aligned to each other by the human speaker. These alignments concern the meaning that the verbal and non-verbal behaviours convey, the form they take up in doing so, the manner in which they are performed, their relative temporal arrangement, and their coordinated organization in a phrasal structure of utterance. Their effects are essential for how meaning is communicated by both modalities concertedly. The resulting confluence of language and gesture has led many researchers (e.g. McNeill, 1992) to believe that speech and gesture are products of the same generative process, starting from one ideational complex and comprising significant interactions between speech and gesture. Yet, it is still an open question as to how language and gesture align in producing a coherent multimodal utterance. Our goal is to systematically investigate the ways in which speech and gesture align within multimodal utterances in dialogueue, and we aim to achieve an understanding of the underlying, intra-personal mechanisms allowing us to model the generation of coordinated language and gesture for embodied conversational agents (ECA). We will focus on deictic gestures, which directly point to a location or region in space, as well as on iconic gestures that impart visual information to the utterance depicting what is being referred to (including the fusion of both functions within single gestures). Concretely, we investigate the following research questions:
  1. What kind(s) of meaning do people convey in concurrent speech and gesture to pursue their communicative intentions?
    At the level of meaning construction, we want to find out about the composition, representation, and distribution of meaning as it comes to be expressed in speech and gesture.
  2. What forms do speech and gesture take up to convey this meaning in context?
    Concerning deictic gestures we study the pointers' "pointing cones", i.e. the domains singled out by pointing gestures. With regard to iconic gestures, this mapping is until now only sparsely understood: What particular gesture forms do speakers use to create a coverbal depiction of certain spatial aspects of a referent? And what particular pieces of spatial meaning do the speakers choose to convey?
  3. How are speech and gesture organized across as well as within incrementally produced, multimodal deliveries?
    We want to investigate in how far self-monitoring can explain the portioning of communicative intentions and content into idea units. Self-monitoring can be regarded as a special case of alignment: Monitoring one's own utterance beiong produced creates representations that are constantly being compared to the intended representations, e.g. to detect failure in speech production.
Investigating these topics encompasses the empirical study and analysis of human behavior as well as the conception of computational models of the processes involved and their realisation in virtual humans. Our empirical studies are expected to elicit sets of dialogue games, which will be annotated in order to apply statistical methodologies to extract significant patterns in the data. Based on the patterns and behavioral units found in data analysis, we will model the generation process that renders content representations and communicative intentions into verbal and gestural behavior. As a starting point our modelling approach will rely on the multi-stage production process conceived for the generation of natural language (Reiter & Dale, 2000). Possible aligning actions between the two modalities will be addressed both within each stage as well as between any two stages. This model of the generation process will directly inform the implementation of a prototype simulation system embedded in our virtual human MAX.

Project Team


Cooperations

  • B1 will cooperate with "A1 - Modelling partners" and "A2 - Processing of implicit common ground" on representations of dialogue states, common grounds, and communicative intent.
  • B1 will also cooperate with A1 and "C1 - Interaction Space" on the overall technical setup for the prototype implementation, including a multimodal humanoid agent in a VR environment.
  • The empirical data gathered will also include check backs, corrections and denials. It thus allows studying how implicit common ground is corrected, established and acknowledged, which are central research issues of the projects A2 and "C3 - Repairs and reformulations in dialogue".
  • The annotation of gesture morphology conducted in B1 will inform the body coding representation to be developed in C1. Project C1, which is working on gesture imitation involving meaning-level analysis and representation of gestural behaviour, will also employ and test B1's results on meaning-form mappings in the gesture imitation scenario. Additionally, the mechanisms of outer-loop self-monitoring developed in B1 represent a starting point in C1 for the monitoring of others and ascribing them communicative intent.
  • B1 relies on very fine-grained measure values concerning the alignment of gesture and speech. A case in point are synchrony effects of gesture stroke and onset of NL expressions. Based on preliminary work, relevant statistical tools for getting at these data will be developed and maintained within the project "X1 - Multimodal alignment corpora: statistical modeling and information management".

last modification: 26.07.2006