Proceedings of the IEEE VR2004
Chicago, USA, March 2004, pp. 35-42.
This paper describes the underlying concepts and the technical implementation of a system for resolving multimodal references in Virtual Reality (VR). It has been developed in the context of speech and gesture driven communication for Virtual Environments where all sorts of temporal and semantic relations between referential utterances and the items in question have to be taken into account during the analysis of a user's multimodal input. The system is based on findings of human cognition research and handles the resolving task unifyingly as a constraint satisfaction problem, where the propositional value of each referential unit during a multimodal dialogue updates the active set of constraints to be satisfied. The system's implementation takes VR related real-time and immersive conditions into account and adapts its architecture in terms of the established system access, interface and integration into VR-based applications to well known scene-graph based design patterns by introducing a so-called reference resolution engine. Regarding the conceptual work as well as regarding the implementation, special care has been taken to allow further refinements and modifications to the underlying resolving processes on a high level basis.