Proceedings of the IEEE fourth International Conference on Multimodal Interfaces, ICMI 2002,
Pittsburgh, USA, October 2002, 411-416.
This article presents one core component for enabling multimodal-speech and gesture-driven interaction in and for Virtual Environments. A so-called temporal Augmented Transition Network (tATN) is introduced. It allows to integrate and evaluate information from speech, gesture, and a given application context using a combined syntactic/semantic parse approach. This tATN represents the target structure for a multimodal integration markup language (MIML). MIML centers around the specification of multimodal interactions by letting an application designer declare temporal and semantic relations between given input utterance percepts and certain application states in a declarative and portable manner. A subsequent parse pass translates MIML into corresponding tATNs which are directly loaded and executed by a simulation engines scripting facility.
multimodal interaction, Virtual Reality, multimodal integration,
transition networks, XML interaction representation