Detection of groups of interacting people is a very interesting and useful task in many modern technologies, with application fields spanning from video-surveillance to social robotics. In this page we collect all the work on this topics performed at the University of Verona by the VIPS Lab members. Two alternative strategies have been proposed over the last years: one based on Hough-Voting and one on Graph-Cuts. All the code and data in this page are freely usable for any scientific research purpose, but if you do that, please cite the related papers.
We strongly recommend the reading of the paper "F-formation Detection: Individuating Free-standing Conversational Groups in Images" for a rigorous definition of group considering the background of the social sciences, which allows us to specify many kinds of group, so far neglected in the Computer Vision literature. On top of this taxonomy, a detailed state of the art on the group detection algorithms is presented in the paper.
In practice, an F-formation is the proper organization of three social spaces: o-space, p-space and r-space. The o-space is a convex empty space surrounded by the people involved in a social interaction, where every participant looks inward into it, and no external people is allowed in this region. The p-space is a narrow stripe that surrounds the o-space, and that contains the bodies of the participants, while the r-space is the area beyond the p-space. There can be different configurations for F-formations; in the case of two participants, typical F-formation arrangements are vis-a-vis, L-shape, and side-by-side. When there are more than three participants, a circular formation is typically formed.
Suppose to know the position and orientation of each individual i in the scene . The goal is to find the o-space where j are the indexes of all the individuals belonging to it (see figure). Sociologists introduced the concept transactional segment of an individual as "the area in front of him/her that can be reached easily, and where hearing and sight are most effective" [Ciolek, 1983]. We model the transactional segment of the individual i with a gaussian distribution , where and . Variables and are input parameters for both HVFF and GCFF algorithms. This formulation has to handle with the visibility constraint which prevent a person to be assigned to a specific group if another individual is occluding him the o-space.
To cluster the transactional segments of the individuals in the scene, we developed over the years several versions of two main methodologies described below. For more details on the algorithms, please refer to the papers.
Under this caption, we consider a set of methods based on a Hough-Voting strategy to build accumulation spaces and find local maxima of this function. The general idea is that each Gaussian probability density function representing the transactional segment of an individual can be approximated by a set of samples, which basically vote for a given o-space centre location. The voting space is then quantized and the votes are aggregated on squared cells, so to form a discrete accumulation space. Local maxima in this space identify o-space centres, and consequently, F-formations. In all these methods the visibility constraint is applied afterwards by checking the composition of the group and its geometry
(a-c) Two subjects exactly facing each other at a fixed distance vote for the same centre of the circumference representing the o-space. (d) The 2 subjects do not face each other exactly in real cases. (e-f) Several positions and head orientations are drawn from Gaussian distributions associated to the subjects so as to deal with the uncertainty of real scenarios, robustifying the proposed approach.
The first work in this field has been presented at BMVC 2011 ; in this paper the votes are linearly accumulated by just summing up all the weights of votes belonging to the same cell. A first improvement of this approach has been presented at WIAMIS 2013 , where the votes are aggregated by using the weighted Boltzmann entropy function. In the same year, at ICIP 2013 , a multi-scale approach is used on top of the entropic version: the idea is that groups with higher cardinality tends to arrange around a larger o-space; the entropic group search runs for different o-space dimensions by filtering groups cardinalities; afterwards, a fusion step is based on a majority criterion.
With this method we propose to use the power of graph-cuts algorithms in clustering graphs, while we need to build a good description of the scene in form of graph. In our formulation the nodes are represented by individuals (i.e. the transactional segments of individuals) and the candidate o-space centres, while edges are defined between each pair of nodes of different type (i.e. between a transactional segment and a candidate o-space centre). We model the probability of each individual to belong to a specific o-space as: . We then build the cost function by adding a Minimum Description Length (MDL) prior and considering the log function of the probability obtained. Moreover, we introduce an additive term which acts as the visibility constraint on the individual i regardless of the group person j is assigned to. The final objective function becomes:
The code for both HVFF and GCFF is publicly available under the GPL license. Everyone can use this code for research purposes. If you publish results of the research, please cite the related papers as reported in the README.txt file inside the zip folder.
Several datasets for group detection in still images have been released recently. We tested the code in this page over the following datasets that can be downloaded from the linked webpages. Anyway, a compact version of the datasets useful for running the code is also provided here: download.
|IDIAP Poster Data
|Cocktail Party Data
|Coffee Break Data