![]() |
![]() |
![]() |
![]() |
![]() |
|||
The goal of perceptual grouping in computer vision is to
organize image primitives into higher level primitives
thus explicitly representing structure contained in the image data.
This aims at reducing ambiguity in image data or in
initial segmentation and thus at increasing the robustness
and efficiency of subsequent processing steps.
The ideas of perceptual grouping for computer vision
have their roots in the well known work of Gestalt psychologists
back at the beginning of the century
who described, among others, the ability of the human visual system
to organize parts of the retinal stimulus to "Gestalten",
into organized structures.
They formulated a number of Gestalt laws, some of which are
illustrated in the figure at the side.
The importance of perceptual grouping for computer vision has been recognized
in the mid 80th by Andrew Witkin and Jay Tenenbaum and by David Lowe.
Our work towards perceptual grouping is conducted within the research project "Situated Artificial Communicators" SFB 360 funded by the German Research Foundation DFG. This project A2 "Mechanisms of perceptual grouping" of SFB 360 is conducted jointly with the Neuroinformatics Group of our faculty investigating techniques of artificial neural networks for the same problem.
For our contour-based approach we initially segment the
image resulting in contour segments approximated by
straight line segments and elliptical arcs.
These are used to define a hierarchy of grouping
hypotheses with growing complexity
using the Gestalt laws of proximity, good continuation,
symmetry, and closure.
The figure at the right shows
the different levels of the hierarchy.
The lowest level contains only one-dimensional primitives
which are grouped according to collinearity, curvilinearity, and proximity.
The medium level consists of symmetric and parallel grouping hypotheses,
while the
last level encloses hypotheses of closed contours.
The first stage of the grouping process is to generate grouping hypotheses taking only local evidence into account. Hypotheses are constructed bottom up with respect to the grouping hierarchy (depicted as thick solid lines in the figure) implementing the various Gestalt principles to organize the image data. We take an active view of image primitives and introduce the concept of Areas of Perceptual Attentiveness to model a search area for each image primitive restricting the relative location of potential grouping partners. Information about shape and size of these areas is derived from a hand labeled training set of our domain for each grouping principle considered. For each type of grouping additional conditions on local attributes like orientation have to be met to generate a grouping hypothesis.
To judge these hypotheses a Markov Random Field
is employed to include global constraints.
Each grouping hypothesis corresponds to a node (or site) of
this graph with an associated random variable
shown as a circle in the above hierarchy.
Therefore, in contrast to most other approaches using MRFs,
different sites may interpret a common subset of the image data.
The random variable of each node represents the (discrete) significance
of the hypothesis being a correct interpretation of the image data.
Again in contrast to other approaches,
the neighborhood system of the graph does not represent spatial
neighborhood between grouping hypotheses, but rather
models the dependencies between hypotheses with regard to
a globally consistent interpretation of the image data.
It is constructed with supporting and competing
undirected edges:
The relation of support is equivalent to the part-of relation,
while
competing edges are defined between hypotheses,
which model contradicting interpretations of
the image data.
Examples of these relations are shown in the above figure of the hierarchy
as solid and dashed lines respectively.
To define the posteriori energy of the MRF we
design appropriate clique potentials reflecting the data dependency,
supporting, and competing relations.
Minimizing this a posteriori energy using HCF results in a
maximum a posteriori estimate of the random field and gives a globally
consistent interpretation of the image data.
The example of a toy plane to the left shows all
collinear and curvilinear groups judged as significant.
is aimed at cue integration of contour- and region-based segmentation and grouping and at integrating region-based information into the judgment phase to enhance clique potentials. Furthermore we are employing contour-based groups for stereo matching and extending the grouping hierarchy itself with even more abstract primitives. Other work deals with temporal grouping of regions using the principle of common fate.







