Large Scale Image Classification

Finding the optimal image descriptor is hard, both computational efficiency and description performance is vital in the final performance. In an effort to extend the decades of low (pixel) level feature extraction experience to a mid level (Superpixel) , we have proposed the SPAD (Superpixel Based Angular Differences) as a mid-level image descriptor. The main motivation is to provide SIFT-like description performance on a non-regular superpixel grid.

In this study we hypothesize that the commonly used pixel based low-level descriptions are useful but can be improved with the introduction of mid-level region information. Hence, we investigate superpixel (SP) based image representation to acquire such mid-level information in order to improve the classification accuracy. Detailed experimental evaluations on classification and retrieval tasks are performed in order to validate the proposed hypothesis. A consistent increase is observed in the mean average precision (MAP) for different experimental scenarios and image categories.

cactus

SP representation of an image with different initial sizes would still provide some visual understanding of the image. The question is whether this understanding add up to the low level information gained from the pixels.  Proposed descriptors use the first, second and third level neighborhood information to address this question.

SPNeigh

Angular difference computation below: Red, green, blue colored regions correspond to the 1st, 2nd, and 3rd order neighborhood of the central SP. Angular differences are combined for different neighborhood and SP sizes.

bicycle3

The descriptive performance of SPAD is evaluated on image the classification task. This task aims at detecting the predefined class of each image in a test set based on training samples. For this purpose, the Pascal VOC 2007 Classification Dataset which consist of 9,963 images (5,011 for training and 4,952 for testing) is used. Some examples of the 20 classes in the dataset are: person, motorbike, air plane, cat, cow, bottle, sofa, etc. The measure used to evaluate the performance of a given system is the  Average Precision (AP) metric. Increase in classification accuracy per class is presented below..

classComparison

 As a further effort to improve the spatial pyramid idea as a pooling step at the image classification pipeline, a geometry constrained region adaptation is proposed. The region adaptation is performed in accordance with the predefined geometric guidelines and underlying image characteristics. Using an approximate global geometric correspondence, exploits the idea that images of the same category share a spatial similarity. This assumption is evaluated and justified in an object classification framework, in which generated region segments are used as an enhancement to the widely utilized ”spatial pyramid” method. Fixed region pyramids are replaced by the proposed locally coherent geometrically consistent region segments. Performance of the proposed method on object classification framework is evaluated on the 20 class Pascal VOC 2007 dataset. The proposed method shows consistent increase in the mean average precision (MAP) score for different experimental scenarios.

trainAll

 Region segment generation for 3 × 1 geometry. a) Initial SPs are generated. b) 3× 1 geometry is imposed on the SP structure. c) Region adaptation is performed on the boundary SPs d) After a number of iterations final spatial pyramid regions are obtained

The proposed method is designed to be generic; hence, it could be integrated into any image classification pipelines to improve the accuracy of the conventional spatial pyramid method with minimal extra effort.

Individual class precision scores have shown that 2× 2 geometry benefits more from the proposed segmentation. This can be related to inadequate region coherency in the 2 × 2 geometry. This indicates a future perspective towards exploration of scene geometry for  region segmentation.

Related publications:

[2015]  H. Emrah Tasli, Ronan Sicre,  Theo Gevers; “SuperPixel based mid-level Image Description for Image Recognition“; Journal of Visual Communication and Image Representation

[2014] H. Emrah Tasli, Ronan Sicre, Theo Gevers, A. Aydin Alatan; “Geometry-Constrained Spatial Pramid Adaptation for Image Classification“; International Conference on Image Processing (ICIP) 2014.

[2014] Ronan Sicre, H. Emrah Tasli, Theo Gevers; “SuperPixel based Angular Differences as a mid-level Image Descriptor“; International Conference on Pattern Recognition (ICPR) 2014

homepage