ViGeM: Visual Recognition of Posture and Gesture for Multimodal Human-Machine-Interaction

Ph.D.-Project of Christian Lange


Today Human-Machine-Communication primarily uses keyboard and mouse as input devices. Humans need to learn to use them before working with them. While communicating via these means, people waste cognitive resources thinking about the handling of the communication devices.

In human-human communication people use speech and gestures, such as pointing with a finger or nodding their head. Things would be much easier, if machines could understand these natural modalities. To achieve this goal the computer needs devices to perceive speech and gestures and algorithms to comprehend the intended meaning. Up to now, input of hand gestures is primarily done by datagloves. These, however, cannot capture gaze direction nor nodding of the head and, furthermore, are unnatural.

In my Ph.D. project I am developing interfaces to enable the computer to recognize gestures solely from visual input. They are much more convenient for the user, because here the user does not need to wear sensors (e.g. datagloves) anymore. My tasks include setting up the hardware (e.g. cameras and frame grabber), implementing algorithms for image processing, body posture extraction, and gesture recognition. The algorithms use a model of the human body that tracks the user's movements. The extracted gestures should will be provided in a universal format that can be used by various applications.

Interesting applications in the context of gestures are graphical user interfaces, control of presentation tools, navigation in image databases, mobile robots, or window managers. For example pointing -which is currently done by using the mouse- can then be done by the finger even on a presentation wall during a talk. Above that, all the "OK"/"Cancel"questions may be answered by just nodding or shaking the head.

Fig. 1: Tasks of ViGeM

Last Change: 2005-06-15
gk256www@techfak.uni-bielefeld.de