A Dataset of Human Manipulation Actions

Updated 09/12/2015

Video: Calibrated RGB-D video recorded using a Kinect device with 30 Hz framerate and a resolution of 640480.

Audio: 4 separate audio tracks using the Kinect microphone array sampled at 16 kHz with 32 bits depth saved as standard waveform audio file.

Object Models: 25 3D object models, built from real images, saved in Wavefront OBJ-files.

Object Pose: 6 degree of freedom (DOF) objects estimatation

Labels: manual labels for each sequence including 6 different sub actions: Open Milk Box,Pour Milk,Close Milk
Box, Open Cereal Box, Pour Cereals, Close Cereal Box

Scripts: python scripts to work with the dataset

Related publications:

Audio-visual classification and detection of human manipulation actions
A. Pieropan, G. Salvi, K. Pauwels, and H. Kjellström. In IROS 14.


The dataset is divided in 20 compressed files. Please download all of them to extract the dataset. A preview of the dataset can be downloaded here.

01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20

Set of 3D object models used in the experiments: download here

Dataset organization

Each recorded video is stored in a separate folder named according to the data it has been recorded. (i.e. 2013-12-10-001)

Each folder contains the following elements:

  • kinect_audio_session_0
  • mat
  • kinect_rgb.ass
  • kinect_rgb.avi
  • models_used.txt


Folder containing 4 separate audio tracks recorded with the kinect mic array. The file start_end_time.txt stores the time stamp of the recorded data.


Folder contatining a list of .mat files. Each file corresponds to a frame of the video sequence. The structure of the matlab file is the following:

  • R: Nx3 rotation matrix
  • T: Nx3 translation matrix
  • disparity: 640×480 disparity map
  • segment_mask: 640×480 image with values from 1 to N
  • timestamp: time stamp of the recorded frame

N indicates the number of objects tracked during the video. The order of the object tracked is stored in the model_used.txt file.


Standard subtitle file for videos. Each label is stored as a separate subtitle entry having a starting time and ending time.


RGB video stored using H264 with high quality resolution. If you want the raw rgb data please contact me.


List of objects used in the video. Each entry name correspond to a folder in the 3D model data set that you can download separately from this page