Gudrun Socher

Bayesian Reasoning on Qualitative Descriptions from Images and Speech.
G. Socher, G. Sagerer, and P. Perona. In H. Buxton and A. Mukerjee (Eds.), ICCV'98 Workshop on Conceptual Description of Images, Bombay, India, to appear 1998.

Talking about 3D Scenes: Integration of Image and Speech Understanding in a Hybrid Distributed System.

G. Socher, G. Sagerer, F. Kummert, and T. Fuhr. In Proc. International Conference on Image Processing (ICIP-96), Lausanne, Sept. 16-19, 1996, pp. 18A2.

Generation of Language Models Using the Results of Image Analysis.

U. Naeve, G. Socher, G.A. Fink, F. Kummert, and G. Sagerer. In Proc. of Eurospeech'95, 4th European Conference on Speech Communication and Technology, Madrid, Spain, 18-21 Sep, pp. 1739-1742, 1995.

Up

3D Reconstruction and Camera Calibration

We developed a method for camera calibration and metric reconstruction of the three-dimensional structure of scenes observing several, possibly small and nearly planar objects in one or more images. The projection of object models is formulated explicitly according to the pin-hole camera model in order to be able to estimate the pose parameters for all objects as well as relative poses and the focal lengths of the cameras. The pose estimation is accomplished by minimizing a multivariate non-linear cost function using the Levenberg-Marquardt method. Necessary prerequisites are simple geometric models containing descriptions of objects as a set of vertices, edges, and ellipses, as well as the correspondence between model and image features. Ellipses are projected in a elegant way using projective invariants.

Publications:

3-D Reconstruction and Camera Calibration from Images with known Objects.

G. Socher, T. Merz, and S. Posch. In D. Pycock (Ed.), Proc. British Machine Vision Conference (BMVC-95), Birmingham, UK, Sept. 11-14, Volume I, pp. 167-176, 1995.

Ellipsenbasierte 3-D Rekonstruktion.

G. Socher, T. Merz, and S. Posch. In G. Sagerer, S. Posch, & F. Kummert (Eds.), 17. DAGM-Symposium Mustererkennung, Bielefeld, Sept. 13-15, pp. 252-259. Springer-Verlag, Berlin, Heidelberg, New York/NY, 1995.

Up

Image Understanding
Image understanding denotes the ability to extract specific, non-numerical information from images, and it is a key problem in computer vision and artificial intelligence.
High-level image understanding is accomplished in our system by reconstructing the 3D scene from uncalibrated stereo images and by computing qualitative object properties as well as spatial relations. Non-numerical information is thus derived in different steps of abstraction.
The object identification module reasons on the derived qualitative information using Bayesian networks.

Publications:

Talking about 3D Scenes: Integration of Image and Speech Understanding in a Hybrid Distributed System.

G. Socher, G. Sagerer, F. Kummert, and T. Fuhr. In Proc. International Conference on Image Processing (ICIP-96), Lausanne, Sept. 16-19, 1996, pp. 18A2.

Semantic Models and Object Recognition in Computer Vision.

G. Sagerer, F. Kummert, and G. Socher. In K. Kraus & P. Waldhäusel (Eds.), International Archives of Photogrammetry and Remote Sensing, Volume XXXI, Part B3, Commission 3, Vienna, pp. 710-723, 1996.

Up

Bayesian Networks
We use qualitative representation for image understanding results which is suitable for reasoning with Bayesian networks. Our representation is enhanced with probabilistic information to represent uncertainties and errors in the understanding of noisy sensory data. An object is not assigned single values for its properties (e.g. an object has the color `orange'), but a vector of probabilities (degrees of membership) for all categories of a property space. For example, the color space is characterized by the color categories red, yellow, orange, blue, green, purple, wooden, white. The color of an object is then represented as, for example, color(rhomb-nut) = (0.4, 0.3, 0.8, 0.1, 0.09, 0.2, 0.15, 0.05). This characterizes that the object rhomb-nut is most likely to be orange. However, the color `orange' is also somewhat red as well as somewhat dark yellow, and thus the degrees of membership for the categories red and yellow are higher than for the other color categories.
The probabilistic information is supplied to a Bayesian networks in order to find the most plausible interpretation. We want to identify which is the object that is addressed in an instruction by the human. We search therefore the object which has the highest probability of being named in the instruction as well as being observed in the scene.
We model the identified object as depending on the instruction and the scene. An instruction consists of type, color, size, and shape specifications.
The scene depends on the objects in the scene. The objects in the scene are described by their type, e.g. cube, bar, and their color.

Example

Instruction: "I want the small round and white thing above the orange rhomb-nut"
Scene:

Search for the "small round and white thing" :

Search for the "orange rhomb-nut" :

Spatial Relations:

IO RO left right above below behind in-front

Socket (205,287) Rhomb-nut (199,211) 0.002 0.196 0.059 0.060 0.000 0.754

Socket (207,178) Rhomb-nut (199,211) 0.164 0.103 0.506 0.002 0.181 0.082

Result:
Publications:

Bayesian Reasoning on Qualitative Descriptions from Images and Speech.
G. Socher, G. Sagerer, and P. Perona. In H. Buxton and A. Mukerjee (Eds.), ICCV'98 Workshop on Conceptual Description of Images, Bombay, India, to appear 1998.

Up

Spatial Relations
We developed an approach for generating and understanding relative spatial positions in a natural three-dimensional scene, in terms of six spatial prepositions, left, right, in-front, behind, above, and below. The three-dimensional structure of a scene is reconstructed from stereo images ( 3D-Reconstruction ).
Our spatial model has two layers. First, a symbolic spatial description of the scene independent of reference frames is computed. Then, in the second layer, the meaning of each of the six prepositions is defined with respect to the current reference frame, based on the description from the first layer. The meaning definitions of the prepositions in the given model can be used in two ways. They allow the system to judge the degree of goodness of each of the six prepositions between two 3D objects according to a graduated scale; and given the 3D pose of one object, the admissible 2D image region of the other object can be inferred.
The spatial model has been extensively tested in psycholinguistic experiments ( Vorwerg et al., 1997).
Publications:

Projective relations for 3D space: Computational model, application, and psychological evaluation.
C. Vorwerg, G. Socher, T. Fuhr, G. Sagerer, and G. Rickheit. In AAAI'97, Providence, Rhode Island, pp. 159 - 164, 1997.

A three-dimensional spatial model for the interpretation of image data.

T. Fuhr, G. Socher, C. Scheering, and G. Sagerer. In IJCAI-95 Workshop on Representation and Processing of Spatial Expressions, Montreal, 1995.

Up

Gudrun Socher - gudrun@vision.caltech.edu
Last modified: Wed Dec 10 18:16:38 PST 1997: G. Socher, G. Sagerer, and P. Perona. In H. Buxton and A. Mukerjee (Eds.), ICCV'98 Workshop on Conceptual Description of Images, Bombay, India, to appear 1998.
Talking about 3D Scenes: Integration of Image and Speech Understanding in a Hybrid Distributed System.: G. Socher, G. Sagerer, F. Kummert, and T. Fuhr. In Proc. International Conference on Image Processing (ICIP-96), Lausanne, Sept. 16-19, 1996, pp. 18A2.
Generation of Language Models Using the Results of Image Analysis.: U. Naeve, G. Socher, G.A. Fink, F. Kummert, and G. Sagerer. In Proc. of Eurospeech'95, 4th European Conference on Speech Communication and Technology, Madrid, Spain, 18-21 Sep, pp. 1739-1742, 1995.

3-D Reconstruction and Camera Calibration from Images with known Objects.: G. Socher, T. Merz, and S. Posch. In D. Pycock (Ed.), Proc. British Machine Vision Conference (BMVC-95), Birmingham, UK, Sept. 11-14, Volume I, pp. 167-176, 1995.
Ellipsenbasierte 3-D Rekonstruktion.: G. Socher, T. Merz, and S. Posch. In G. Sagerer, S. Posch, & F. Kummert (Eds.), 17. DAGM-Symposium Mustererkennung, Bielefeld, Sept. 13-15, pp. 252-259. Springer-Verlag, Berlin, Heidelberg, New York/NY, 1995.

Talking about 3D Scenes: Integration of Image and Speech Understanding in a Hybrid Distributed System.: G. Socher, G. Sagerer, F. Kummert, and T. Fuhr. In Proc. International Conference on Image Processing (ICIP-96), Lausanne, Sept. 16-19, 1996, pp. 18A2.
Semantic Models and Object Recognition in Computer Vision.: G. Sagerer, F. Kummert, and G. Socher. In K. Kraus & P. Waldhäusel (Eds.), International Archives of Photogrammetry and Remote Sensing, Volume XXXI, Part B3, Commission 3, Vienna, pp. 710-723, 1996.

Bayesian Reasoning on Qualitative Descriptions from Images and Speech.
G. Socher, G. Sagerer, and P. Perona. In H. Buxton and A. Mukerjee (Eds.), ICCV'98 Workshop on Conceptual Description of Images, Bombay, India, to appear 1998.

Up

Spatial Relations
We developed an approach for generating and understanding relative spatial positions in a natural three-dimensional scene, in terms of six spatial prepositions, left, right, in-front, behind, above, and below. The three-dimensional structure of a scene is reconstructed from stereo images ( 3D-Reconstruction ).
Our spatial model has two layers. First, a symbolic spatial description of the scene independent of reference frames is computed. Then, in the second layer, the meaning of each of the six prepositions is defined with respect to the current reference frame, based on the description from the first layer. The meaning definitions of the prepositions in the given model can be used in two ways. They allow the system to judge the degree of goodness of each of the six prepositions between two 3D objects according to a graduated scale; and given the 3D pose of one object, the admissible 2D image region of the other object can be inferred.
The spatial model has been extensively tested in psycholinguistic experiments ( Vorwerg et al., 1997).
Publications:

Projective relations for 3D space: Computational model, application, and psychological evaluation.
C. Vorwerg, G. Socher, T. Fuhr, G. Sagerer, and G. Rickheit. In AAAI'97, Providence, Rhode Island, pp. 159 - 164, 1997.

A three-dimensional spatial model for the interpretation of image data.

T. Fuhr, G. Socher, C. Scheering, and G. Sagerer. In IJCAI-95 Workshop on Representation and Processing of Spatial Expressions, Montreal, 1995.

Up

Gudrun Socher - gudrun@vision.caltech.edu
Last modified: Wed Dec 10 18:16:38 PST 1997: G. Socher, G. Sagerer, and P. Perona. In H. Buxton and A. Mukerjee (Eds.), ICCV'98 Workshop on Conceptual Description of Images, Bombay, India, to appear 1998.

Projective relations for 3D space: Computational model, application, and psychological evaluation.
C. Vorwerg, G. Socher, T. Fuhr, G. Sagerer, and G. Rickheit. In AAAI'97, Providence, Rhode Island, pp. 159 - 164, 1997.

A three-dimensional spatial model for the interpretation of image data.

T. Fuhr, G. Socher, C. Scheering, and G. Sagerer. In IJCAI-95 Workshop on Representation and Processing of Spatial Expressions, Montreal, 1995.

Up

Gudrun Socher - gudrun@vision.caltech.edu
Last modified: Wed Dec 10 18:16:38 PST 1997: C. Vorwerg, G. Socher, T. Fuhr, G. Sagerer, and G. Rickheit. In AAAI'97, Providence, Rhode Island, pp. 159 - 164, 1997.
A three-dimensional spatial model for the interpretation of image data.: T. Fuhr, G. Socher, C. Scheering, and G. Sagerer. In IJCAI-95 Workshop on Representation and Processing of Spatial Expressions, Montreal, 1995.

Research

Research Interests

Integration of Speech and Image Understanding

3D Reconstruction and Camera Calibration

Image Understanding

Bayesian Networks

Example

Spatial Relations

IO	RO	left	right	above	below	behind	in-front
Socket (205,287)	Rhomb-nut (199,211)	0.002	0.196	0.059	0.060	0.000	0.754
Socket (207,178)	Rhomb-nut (199,211)	0.164	0.103	0.506	0.002	0.181	0.082