Bayesian Reasoning on Qualitative Descriptions
from Images and Speech.
G. Socher, G. Sagerer, and P. Perona.
In H. Buxton and A. Mukerjee (Eds.), ICCV'98 Workshop on
Conceptual Description of Images, Bombay, India, to appear 1998.
Talking about 3D Scenes: Integration of Image and Speech Understanding
in a Hybrid Distributed System.
G. Socher, G. Sagerer, F. Kummert, and T. Fuhr.
In Proc. International Conference on Image Processing (ICIP-96),
Lausanne, Sept. 16-19, 1996, pp. 18A2.
Generation of Language Models Using the Results of Image Analysis.
U. Naeve, G. Socher, G.A. Fink, F. Kummert, and G. Sagerer.
In Proc. of Eurospeech'95, 4th European Conference on Speech
Communication and Technology, Madrid, Spain, 18-21 Sep, pp. 1739-1742,
1995.
Up
We developed a method for camera calibration and metric reconstruction of the
three-dimensional structure of scenes observing
several, possibly small and nearly
planar objects in one or more images.
The projection of object
models is formulated explicitly
according to the pin-hole camera model in order to be able to estimate the pose
parameters for all objects as well as
relative poses and the focal lengths of the cameras.
The pose estimation is accomplished by minimizing a multivariate non-linear cost function
using the Levenberg-Marquardt method.
Necessary prerequisites are
simple geometric models containing descriptions of objects as a set of vertices,
edges, and ellipses, as well as the correspondence between model and image features.
Ellipses are projected in a elegant way using projective invariants.
Publications:
3-D Reconstruction and Camera Calibration from Images with known Objects.
G. Socher, T. Merz, and S. Posch.
In D. Pycock (Ed.), Proc. British
Machine Vision Conference (BMVC-95), Birmingham, UK, Sept. 11-14,
Volume I, pp. 167-176, 1995.
Ellipsenbasierte 3-D Rekonstruktion.
G. Socher, T. Merz, and S. Posch.
In G. Sagerer, S. Posch, & F. Kummert (Eds.), 17. DAGM-Symposium
Mustererkennung, Bielefeld, Sept. 13-15, pp. 252-259.
Springer-Verlag, Berlin, Heidelberg, New York/NY, 1995.
Up
Image understanding denotes the ability to extract specific,
non-numerical information from images, and it is a key problem
in computer vision and artificial intelligence.
High-level image understanding is accomplished in our system by
reconstructing the 3D scene from uncalibrated stereo images and by
computing qualitative object properties as well as spatial relations.
Non-numerical information is thus derived in different steps of abstraction.
The object identification module reasons on the derived qualitative information
using Bayesian networks.
Publications:
Talking about 3D Scenes: Integration of Image and Speech Understanding
in a Hybrid Distributed System.
G. Socher, G. Sagerer, F. Kummert, and T. Fuhr.
In Proc. International Conference on Image Processing (ICIP-96),
Lausanne, Sept. 16-19, 1996, pp. 18A2.
Semantic Models and Object Recognition in Computer Vision.
G. Sagerer, F. Kummert, and G. Socher.
In K. Kraus & P. Waldhäusel (Eds.), International Archives of Photogrammetry and Remote
Sensing, Volume XXXI, Part B3, Commission 3, Vienna, pp. 710-723, 1996.
Up
We use qualitative representation for image understanding results
which is suitable for reasoning with Bayesian networks.
Our representation is enhanced with probabilistic
information to represent uncertainties and errors in the understanding of
noisy sensory data.
An object is not assigned single values for its properties
(e.g. an object has the color `orange'), but a vector of probabilities
(degrees of membership) for all categories of a property space. For example,
the color space is characterized by the color categories
red, yellow, orange, blue, green, purple, wooden, white.
The color of an object is then represented as, for example,
color(rhomb-nut) = (0.4, 0.3, 0.8, 0.1, 0.09, 0.2, 0.15, 0.05).
This characterizes that the object rhomb-nut is most likely to be orange.
However, the color `orange' is also somewhat red as well as somewhat dark yellow,
and thus the degrees of membership for the categories red and yellow
are higher than for the other color categories.
The probabilistic information is supplied to a
Bayesian networks in order to find the most plausible interpretation.
We want to identify which is the object that is addressed in an instruction by
the human. We search therefore the object which has the highest probability
of being named in the instruction as well as being observed in the scene.
We model the identified object as depending on the
instruction
and the scene. An instruction consists of type, color, size, and
shape specifications.
The scene depends on the objects in the scene.
The objects in the scene are described by their type, e.g. cube, bar, and
their color.
Example
Instruction: "I want the small round and white thing above the orange rhomb-nut"
Scene:
Search for the "small round and white thing" :
Search for the "orange rhomb-nut" :
Spatial Relations:
IO | RO | left | right | above | below | behind | in-front
|
Socket (205,287) | Rhomb-nut (199,211) | 0.002 | 0.196 | 0.059 | 0.060 | 0.000 | 0.754
|
Socket (207,178) | Rhomb-nut (199,211) | 0.164 | 0.103 | 0.506 | 0.002 | 0.181 | 0.082
|
Result:
Publications:
Bayesian Reasoning on Qualitative Descriptions
from Images and Speech.
G. Socher, G. Sagerer, and P. Perona.
In H. Buxton and A. Mukerjee (Eds.), ICCV'98 Workshop on
Conceptual Description of Images, Bombay, India, to appear 1998.
Up
We developed an approach for generating and understanding relative spatial
positions in a natural
three-dimensional scene, in terms of six spatial prepositions, left,
right, in-front, behind, above, and below.
The three-dimensional structure of a scene is reconstructed from stereo images
( 3D-Reconstruction ).
Our spatial model has two layers.
First, a symbolic spatial description of the scene
independent of reference frames is computed.
Then, in the second layer, the meaning of each of
the six prepositions is defined with respect to the current reference frame,
based on the description from the first layer.
The meaning definitions of the prepositions in the given model can be used in two ways.
They allow the system to judge the degree of goodness of each of the
six prepositions
between two 3D objects according to a graduated scale;
and given the 3D pose of one object, the admissible 2D image
region of the other object can be inferred.
The spatial model has been extensively tested in psycholinguistic experiments
( Vorwerg et al., 1997).
Publications:
Projective relations for 3D space: Computational model, application,
and psychological evaluation.
C. Vorwerg, G. Socher, T. Fuhr, G. Sagerer, and G. Rickheit.
In AAAI'97, Providence, Rhode Island, pp. 159 - 164, 1997.
A three-dimensional spatial model for the interpretation of image data.
T. Fuhr, G. Socher, C. Scheering, and G. Sagerer.
In IJCAI-95 Workshop on Representation and Processing of Spatial
Expressions, Montreal, 1995.
Up
Gudrun Socher
- gudrun@vision.caltech.edu
Last modified: Wed Dec 10 18:16:38 PST 1997