Most object encoutered in daily life are composed of a number of primitive standard parts. Hence, it is assumed that a large set of man made objects can be quite faithfully approximated by a small set of standard primitives composed of single geometric shapes such as rectangles or circles. The objective of this work is thus to detect generic objects when their surfaces resemble one model primitive.
The original algorithm has been developed by Wang and Ben-Arie [1] and here modified and studied: the goal is to locate the occurrences of a set of primitives contained in an offline-generated database that includes affine transformed appearances of the model shapes. Recognition is achieved by projecting the vectorial map of the image on a vectorial eigenbase and then generating an energy map: a fast search algorithm is applied at each local maxima of the map to find the best matching model. A vectorial correlation is finally implemented for verification and removal of false alarms.
In this approach, we are interested in the boundary information of shapes; the texture within a shape is generally ignored. The boundary information used includes a vector normal to the boundary at each pixel of the edge: the augmentation of boundary location with boundary direction eliminates many false positive detections mainly caused due to severe background clutter.
Normals are identified by Wang and Ben-Arie through direction and verse, while we use only direction (see figure 1): thus, we don't need to know which is the interior of the shape.
Given T(x,y) the angle of the normal to the edge in (x,y), a model is represented by its 2D vectorial map:
The Karhunen-Loeve transform (K-L) is performed on the set P of the vectorial boundary maps generated from different views of the model: a vectorial base is then extracted from the eigenvectors generated by the K-L.
The energy map can be considered as a measure of similarity of a windowed vectorial image map with the reference appearances in the vectorial subspace. By detecting the local maxima in the energy map, a set of candidate locations can be found for possible matching between appearance of the model shape with the image shape.
Wang and Ben-Arie used a unique energy map where to add the energy values
obtained from all the searched primitives (see figure 2a); in this work, every
primitive generates a different map, achieving more accurate results in local
maxima detection (see figure 2b).
|
|
Figure 2a. Unique energy map. | Figure 2b. An energy map for every primitive. |
Given the image projection coefficient J-length array b and the model ones ai, it can be seen that the one yielding the maximum inner product is the closest one to b in orientation in the 2(J+1) dimensional space. Thus, we can map the points in the vectorial space to points in a 2(J+1) dimensional angle space:
The mapped points from the local maxima of the energy map are considered as query points and many nearest neighbour search approaches can be used to find the best match in the database generated from P. Here, the multidimensional nearest neighbour search scheme proposed by Nene and Nayar [2] is used to find a small set of points which are closest to the query point. One advantage of thhis search approach is that the computation is almost indipendent of the dimension of the points in the database and linear with the total number of elements in the dataset.
After we find out the best matching pose for each candidate locations, we correlate the sub-image with the corresponfing primitive itself; if the correlation is less than a predefined threshold, we consider the candidate as a false alarm, otherwise we keep it as a detection.
We test the algorithm with images in various distortion and noise conditions
to evaluate its performance.
(a) |
(b) |
(c) |
(d) |
(e) |
In all the experiments, the model shapes (a square, a circle and a triangle)
and all their appereances are represented by 50x50 images. The set P is
generated by rotating the model shape in steps of 36 degrees to all the
directions within a range of 180 degrees, obtaining a set of 40 different
views. The test images are 100x100 and their vectorial edge maps are obtained
using Canny edge detector. Two kind of noise are then added to the images:
gaussian noise with variance of 0.01 and mean of 0.2 (G1) and 0.4
(G2), and salt&pepper noise with 10% (S&P1) and
20% (S&P2) of the total number of pixels affected (see an
example in figure 3). The results are shown in table 1.
Noise | Correct detections | False alarms |
Absent | 0.00 | 0.05 |
G1 | 0.00 | 0.40 |
G2 | 1.60 | 0.46 |
S&P1 | 0.78 | 0.75 |
S&P2 | 2.00 | 0.80 |