Elementary shape recognition using eigenspaces

Most object encoutered in daily life are composed of a number of primitive standard parts. Hence, it is assumed that a large set of man made objects can be quite faithfully approximated by a small set of standard primitives composed of single geometric shapes such as rectangles or circles. The objective of this work is thus to detect generic objects when their surfaces resemble one model primitive.

The original algorithm has been developed by Wang and Ben-Arie [1] and here modified and studied: the goal is to locate the occurrences of a set of primitives contained in an offline-generated database that includes affine transformed appearances of the model shapes. Recognition is achieved by projecting the vectorial map of the image on a vectorial eigenbase and then generating an energy map: a fast search algorithm is applied at each local maxima of the map to find the best matching model. A vectorial correlation is finally implemented for verification and removal of false alarms.

Vectorial Map

In this approach, we are interested in the boundary information of shapes; the texture within a shape is generally ignored. The boundary information used includes a vector normal to the boundary at each pixel of the edge: the augmentation of boundary location with boundary direction eliminates many false positive detections mainly caused due to severe background clutter.

Normals are identified by Wang and Ben-Arie through direction and verse, while we use only direction (see figure 1): thus, we don't need to know which is the interior of the shape.


Figure 1. (a) Normals in Wang and Ben-Arie implementation. (b) Normals in this implementation.

Given T(x,y) the angle of the normal to the edge in (x,y), a model is represented by its 2D vectorial map:

The Karhunen-Loeve transform (K-L) is performed on the set P of the vectorial boundary maps generated from different views of the model: a vectorial base is then extracted from the eigenvectors generated by the K-L.

Energy Map

Given the set P and the corresponding vectorial bases B, we define the projection energy Ei of the view pi as:
where aijh and aijv are the projection coefficients of pi on the eigenbases. When we have the vectorial edge map of the image, a corresponding energy map is defined by:
where bjh(x,y) and bjv(x,y) are the projection coefficients of the edge map at the location (x,y).

The energy map can be considered as a measure of similarity of a windowed vectorial image map with the reference appearances in the vectorial subspace. By detecting the local maxima in the energy map, a set of candidate locations can be found for possible matching between appearance of the model shape with the image shape.

Wang and Ben-Arie used a unique energy map where to add the energy values obtained from all the searched primitives (see figure 2a); in this work, every primitive generates a different map, achieving more accurate results in local maxima detection (see figure 2b).



Figure 2a. Unique energy map. Figure 2b. An energy map for every primitive.

Efficient Nearest Neighbour Search

The local maxima detected in the energy map indicate the potential locations of appearances of the model shape in the set P. Thus, it is necessary to accurately dtermine which appearance of the model shape best matches the image at each local maxima. Here, the closest appearance is the one which has maximum inner product with the underlying image edge map.

Given the image projection coefficient J-length array b and the model ones ai, it can be seen that the one yielding the maximum inner product is the closest one to b in orientation in the 2(J+1) dimensional space. Thus, we can map the points in the vectorial space to points in a 2(J+1) dimensional angle space:

The mapped points from the local maxima of the energy map are considered as query points and many nearest neighbour search approaches can be used to find the best match in the database generated from P. Here, the multidimensional nearest neighbour search scheme proposed by Nene and Nayar [2] is used to find a small set of points which are closest to the query point. One advantage of thhis search approach is that the computation is almost indipendent of the dimension of the points in the database and linear with the total number of elements in the dataset.

After we find out the best matching pose for each candidate locations, we correlate the sub-image with the corresponfing primitive itself; if the correlation is less than a predefined threshold, we consider the candidate as a false alarm, otherwise we keep it as a detection.

Results

We test the algorithm with images in various distortion and noise conditions to evaluate its performance.


(a)

(b)

(c)

(d)

(e)

Figure 3. (a) An example of test image. (b) G1 noise added. (c) G2 noise added.
(d) S&P1 noise added. (e) S&P2 noise added.

In all the experiments, the model shapes (a square, a circle and a triangle) and all their appereances are represented by 50x50 images. The set P is generated by rotating the model shape in steps of 36 degrees to all the directions within a range of 180 degrees, obtaining a set of 40 different views. The test images are 100x100 and their vectorial edge maps are obtained using Canny edge detector. Two kind of noise are then added to the images: gaussian noise with variance of 0.01 and mean of 0.2 (G1) and 0.4 (G2), and salt&pepper noise with 10% (S&P1) and 20% (S&P2) of the total number of pixels affected (see an example in figure 3). The results are shown in table 1.

Noise Correct detections False alarms
Absent 0.00 0.05
G1 0.00 0.40
G2 1.60 0.46
S&P1 0.78 0.75
S&P2 2.00 0.80

Table 1. Mean numbers of detections and false alarms on a set of 40 test images.

References

  1. Z. Wang and J. Ben-Arie, "Detection and segmentation of generic shapes based on affine modeling of energy in eigenspaces", IEEE Transaction on Pattern Analysis and Machine Intelligence, vol. 23, no. 11, pp. 1621-1629, 2001.
  2. S.A. Nene and S.K. Nayar, "A simple algorithm for nearest neighbor search in high dimension spaces", IEEE Transaction on Pattern Analysis and Machine Intelligence, vol. 19, no. 9, pp. 989-1003, 1997.