The goal of perceptual grouping in computer vision is to organize image primitives into higher level primitives thus explicitly representing structure contained in the image data. This aims at reducing ambiguity in image data or in initial segmentation and thus at increasing the robustness and efficiency of subsequent processing steps. The ideas of perceptual grouping for computer vision have their roots in the well known work of Gestalt psychologists back at the beginning of the century who described, among others, the ability of the human visual system to organize parts of the retinal stimulus to "Gestalten", into organized structures. They formulated a number of Gestalt laws, some of which are illustrated in the figure at the side. The importance of perceptual grouping for computer vision has been recognized in the mid 80th by Andrew Witkin and Jay Tenenbaum and by David Lowe.
Our work towards perceptual grouping is conducted within the research project "Situated Artificial Communicators" SFB 360 funded by the German Research Foundation DFG. This project A2 "Mechanisms of perceptual grouping" of SFB 360 is conducted jointly with the Neuroinformatics Group of our faculty investigating techniques of artificial neural networks for the same problem.
For our contour-based approach we initially segment the image resulting in contour segments approximated by straight line segments and elliptical arcs. These are used to define a hierarchy of grouping hypotheses with growing complexity using the Gestalt laws of proximity, good continuation, symmetry, and closure. The figure at the right shows the different levels of the hierarchy. The lowest level contains only one-dimensional primitives which are grouped according to collinearity, curvilinearity, and proximity. The medium level consists of symmetric and parallel grouping hypotheses, while the last level encloses hypotheses of closed contours.
The first stage of the grouping process is to generate grouping hypotheses taking only local evidence into account. Hypotheses are constructed bottom up with respect to the grouping hierarchy (depicted as thick solid lines in the figure) implementing the various Gestalt principles to organize the image data. We take an active view of image primitives and introduce the concept of Areas of Perceptual Attentiveness to model a search area for each image primitive restricting the relative location of potential grouping partners. Information about shape and size of these areas is derived from a hand labeled training set of our domain for each grouping principle considered. For each type of grouping additional conditions on local attributes like orientation have to be met to generate a grouping hypothesis.
To judge these hypotheses a Markov Random Field is employed to include global constraints. Each grouping hypothesis corresponds to a node (or site) of this graph with an associated random variable shown as a circle in the above hierarchy. Therefore, in contrast to most other approaches using MRFs, different sites may interpret a common subset of the image data. The random variable of each node represents the (discrete) significance of the hypothesis being a correct interpretation of the image data. Again in contrast to other approaches, the neighborhood system of the graph does not represent spatial neighborhood between grouping hypotheses, but rather models the dependencies between hypotheses with regard to a globally consistent interpretation of the image data. It is constructed with supporting and competing undirected edges: The relation of support is equivalent to the part-of relation, while competing edges are defined between hypotheses, which model contradicting interpretations of the image data. Examples of these relations are shown in the above figure of the hierarchy as solid and dashed lines respectively. To define the posteriori energy of the MRF we design appropriate clique potentials reflecting the data dependency, supporting, and competing relations. Minimizing this a posteriori energy using HCF results in a maximum a posteriori estimate of the random field and gives a globally consistent interpretation of the image data. The example of a toy plane to the left shows all collinear and curvilinear groups judged as significant.
is aimed at cue integration of contour- and region-based segmentation and grouping and at integrating region-based information into the judgment phase to enhance clique potentials. Furthermore we are employing contour-based groups for stereo matching and extending the grouping hierarchy itself with even more abstract primitives. Other work deals with temporal grouping of regions using the principle of common fate.