Dialogue Coordination for Sociable Agents

Hendrik Buschmeier and Stefan Kopp

Smooth and trouble-free interaction in dialogue is only possible if interlocutors are able to coordinate their actions with each other. Because of this, we think that an understanding of how human coordination devices work is the basis for successful spoken language computer interfaces such as embodied conversational agents or dialogue systems. In this research project we model two important coordination mechanisms that are commonly found in human-human interaction: feedback and adaptation.

Multimodal dialogical face-to-face communication in a calendar domain.


Feedback is an expressive, economic and rapid coordination device used by listeners to signal contact, perception, understanding and agreement as well as attitude towards an utterance of a speaker. Feedback can be conveyed in the form of short and unobstrusive verbal signals (called ‘backchannels’) or via nonverbal behaviour such as head gestures, eye gaze or facial expression.

A range of research projects already enabled embodied conversational agents to provide verbal as well as non-verbal backchannels to human speakers talking to them. In contrast to this, the aim of this project is to create an attentive speaker agent which learns to recognise, interpret and react appropriately to feedback signals provided by human listeners. Furthermore, this knowledge on feedback understanding will then be used to generate meaningful feedback for human users speaking with the agent.


It can be observed that people in dialogue adapt to each other. One example is that, after interacting for a certain period of time, they tend to use the same vocabulary and syntactic constructions. Several convincing (and most likely not independent) theories from different perspectives offer explanation for this phenomenon, among them the mechanistic interactive alignment account of dialogue which sees priming in the language processing system as the reason for adaptation (then called ‘alignment’) on the linguistic level.

As an example for low-level coordination in dialogue, we proposed a priming-based computational model of linguistic alignment and implemented it in a microplanning component for natural language generation as well as in a parser. In this model, language use (i.e., generation and parsing) is, on the one hand, guided by activation values attached to linguistic resources (grammar, lexicon) and, on the other hand, influences these activation values through priming.

Buschmeier et al.'s priming-based alignment model.

Evaluation of the microplanner on a human-human dialogue corpus shows that the alignment-capable version outperforms a baseline version in which alignment was deactivated.


