Establishing Joint Attention with a Virtual Human

Ipke Wachsmuth

A foundational skill in human social interaction, joint attention is also receiving increased interest in human-robot interaction and in virtual humans research. Joint attention can be defined as simultaneously allocating attention (i.e. intentionally directed perception) to a target object as a consequence of attending to each other's attentional states. While existing computational models mostly deal with surface behaviors like simultaneous looking or perceptual attention, aspects of aligning the attentional foci of the interactants are not covered.

We (joint work with Nadine Pfeiffer-Leßmann) investigate joint attention in a cooperative interaction scenario with the virtual human Max. The human interlocutor meets the human-sized embodied agent face-to-face in 3D virtual reality. The human's body and gaze are picked up by Max by use of an infrared camera system and an eye-tracker; e.g., Max can follow the human's gaze as a basic manifestation of joint attention. The agent's mental state is modeled in the BDI (Belief-Desire-Intention) paradigm and serves as the origin of attention mechanisms. For establishing joint attention, three main aspects are considered. Firstly, the human interlocutor's focus of attention has to be inferred from the interlocutor's overt behaviors. Secondly, the situational context is taken into account by activation processes marking relevant objects as salient. Thirdly, the agent itself needs to display appropriate overt behaviors to accentuate its focus of attention and manipulate the interlocutor's mental state.

As an indicator of the human interlocutor's focus of attention, the human's gaze is evaluated; e.g., an object is detected as being in human's attentional focus when it has been focused at least for a total of 400 ms in a 600 ms time frame. In addition, pointing gestures may serve as intentional cues. For the agent to ascribe a desire to establish joint attention to its interlocutor, the following heuristic is used: An object has to be focused by the human interlocutor several times, with additional glances addressing the agent in between (triadic intentional relation). When the activation of an object passes a threshold and the interlocutor has shown interactive glances, the agent asserts a belief about the interlocutor's intention and responds, e.g., by gazing at the same object to achieve joint attention. While attention detection can be seen as a prerequisite for establishing joint attention, the agent also employs pro-active mechanisms to manipulate the interlocutor's focus of attention, e.g., by intentional gaze or pointing gestures.

Talks: Bochum 16-08-2008, Hamburg 24-11-2008

