Structured Object Database



We are constructing the largest structured domain object database in Asia. The database has four major domains, including e-commerce, tourism, human and lifestyle & hobbies. The project is mainly directed by Prof. Wang Gang and Prof.Cong Gao.


The technical details as below:



Action Recognition Dataset - The NTU RGB+D action recognition dataset consists of 56,880 action samples containing RGB videos, depth map sequences, 3D skeletal data, and infrared videos for each sample. This dataset is captured by 3 Microsoft Kinect v.2 cameras concurrently. The resolution of RGB videos are 1920×1080, depth maps and IR videos are all in 512×424, and 3D skeletal data contains the three dimensional locations of 25 major body joints, at each frame.  The total size of the dataset is 1.3TB.

For details, please visit

Video Object Instance Dataset - The Video-Object-Instance (NTU-VOI) dataset from NTU’s ROSE Lab is provided for the evaluation of object instance search and localization in large scale videos. It consists of 146 ground truth video clips with bounding box annotations of object instances in each frame. The total download size of the videos is ~222MB.

For details, please visit


Recaptured Images Dataset - The images in the database are captured by using 5 different brands camera (Canon, Casio, Lumix, Nikon and Sony) consisting 2000 natural images and 2700 finely recaptured images. The resolutions range from 2272 by 1704 to 4256 by 2832.

For details, please visit