PhD Thesis

I was a PhD candidate from 2006 to 2010, under the supervision of Prof. Nikos Paragios (Ecole Centrale Paris, France) and Assoc. Prof. Véronique Prinet (Institute of Automation, National Academy of Sciences, Beijing, China). My field of research was automated image recognition and object detection. In other words, I was trying to teach computers to answer the question: “what are the objects contained in this image?

Abstract

Our work addresses the problem of automated 2D image classification and general object detection. Advances in this field of research contribute to the elaboration of intelligent systems such as, but not limited to, autonomous robots and the semantic web. In this context, designing adequate image representations and classifiers for these representations constitute challenging issues. Our work provides innovative solutions to both these problems: image representation and classification.

In order to generate our image representation, we extract visual features from the image and build a graphical structure based on properties of spatial proximity between the feature points. We show that certain spectral properties of this graph constitute good invariants to rigid geometric transforms. Our representation is based on these invariant properties. Experiments show that this representation constitutes an improvement over other similar representations that do not integrate the spatial layout of visual features. However, a drawback of this method is that it requires a lossy quantisation of the visual feature space in order to be combined with a state-of-the-art support vector machine (SVM) classifier. We address this issue by designing a new classifier. This generic classifier relies on a nearest-neighbour distance to classify objects that can be assimilated to feature sets, i.e: point clouds. The linearity of this classifier allows us to perform object detection, in addition to image classification. Another interesting property is its ability to combine different types of visual features in an optimal manner. We take advantage of this property to produce a new formulation for the classification of visual feature graphs. Experiments are conducted on a wide variety of publicly available datasets to justify the benefits of our approach.

Manuscript (final version 2010-10-10)

Defence presentation (2010-09-15)

Publications

Code

Optimal Naive Bayes Nearest Neighbours (ONBNN) is a C++ implementation of my work on image classification. It's very slow, but it's really simple and an extremely powerful classifier. At the time it beat kernel SVM on image classification in terms of performance.