Published on: 15-Jan-2016
The ImageNet Large Scale Visual Recognition Challenge (ILSVRC) is one of the most important competitions in computer vision community since it is a benchmark of several basic problems in this field, e.g., object category classification and detection on hundreds of categories and millions of images. The challenge has been run annually from 2010 to present, attracting participation from more than sixty leading institutions globally. Past winners include leading technology companies (e.g., Google, NEC) and top universities (e.g., Oxford University, University of Toronto, New York University).
The 2015 challenge was organized by Stanford University and UNC Chapel Hill. 67 teams from different universities and companies from all over the world took part in four different tasks of the competition including object detection, object localization, object detection from video, and scene classification.
The ROSE Lab team focused on the scene classification task, which aims to classify images into 401 scene categories. Humans are extremely proficient at perceiving natural scenes and understanding their contents. However, we know surprisingly little about how or even where in the brain we process such scenes. Work on this project requires us to perform an analysis of statistical properties of natural scenes. This type of analysis allows for a deeper understanding of how to process the kinds of images we encounter in everyday life, and for designing the next generate algorithms to approach human-level vision. Potential applications of scene classification include content-based indexing and organization of images, content-sensitive image enhancement etc.
The task is very difficult due the large scale of training data (8.1 million of images in total), noisy class labels, large intra-class variances, and ambiguities among different classes. To deal with these problems, we build a distributed system which can perform training simultaneously on multiple GPUs, running on multiple servers. Also, we propose a CNN (convolutional neural network) tree to progressively learn fine grained features to distinguish different categories.
At the end of the challenge, the ROSE team ranked fifth world-wide in the Scene Classification task. The result demonstrates that we are the top-tier university research team in the computer vision community. As it is the first time that we participate this challenge, we believe that we could get better results in next year with more hardware support and closer collaboration with our partners.
Figure 1. Some samples of images from different categories of scenes.
Figure 2. Competition results of the scene classification task.
Back to listing