Structured Object Database



We are constructing the largest structured domain object database in Asia. The database has; four major domains, including e-commerce, tourism, human; and lifestyle; hobbies. The project is mainly directed by Prof.Cong Gao.


The technical details as below:


Publicly Shared Datasets
While some parts of our database is confidential, we do have the following datasets that are available for download upon request.

SIR2 Benchmark Dataset- We propose the Single Image Reflection Removal(SIR2) Benchmark Dataset with a large number and a great diversity of mixture images, and ground truth of background and reflection. Our dataset includes the controlled scenes taken indoor and wild scenes taken outdoor. One part of the controlled scene is composed by a set of solid objects, which uses commonly available daily-life objects (e.g. ceramix mugs, plush toys, fruits, etc.) for both the background and the reflected scenes. The other parts of the controlled scenes use five different postcards and combines them in a pair-wise manner by using each card as background and reflection, respectively. The wild scenes are with real-world objects of complex reflectance (car, tree leaves, glass windows, etc), various distances and scales (residential halls, gardens, and lecture room, etc), and different illuminations (direct sunlight, cloudy sky light and twilight, etc.).

For details, please visit

ROSE-Youtu Face Liveness Detection Dataset - We introduce a new and comprehensive face anti-spoofing database, ROSE-Youtu Face Liveness Detection Database, which covers a large variety of illumination conditions, camera models, and attack types. The ROSE-Youtu Face Liveness Detection Database (ROSE-Youtu) consists of 4225 videos with 25 subjects in total (3350 videos with 20 subjects publically available with 5.45GB in size).

For details, please visit

Action Recognition Dataset - The NTU RGB+D action recognition dataset consists of 56,880 action samples containing RGB videos, depth map sequences, 3D skeletal data, and infrared videos for each sample. This dataset is captured by 3 Microsoft Kinect v.2 cameras concurrently. The resolution of RGB videos are 1920×1080, depth maps and IR videos are all in 512×424, and 3D skeletal data contains the three dimensional locations of 25 major body joints, at each frame. The total size of the dataset is 1.3TB.

For details, please visit

Video Object Instance Dataset - The Video-Object-Instance (NTU-VOI) dataset from NTU’s ROSE Lab is provided for the evaluation of object instance search and localization in large scale videos. It consists of 146 ground truth video clips with bounding box annotations of object instances in each frame. The total download size of the videos is ~222MB.

For details, please visit

Recaptured Images Dataset - The images in the database are captured by using 5 different brands camera (Canon, Casio, Lumix, Nikon and Sony) consisting 2000 natural images and 2700 finely recaptured images. The resolutions range from 2272 by 1704 to 4256 by 2832.

For details, please visit​​​