Tensor Representations for Summarizing Fine-Grained Actions

Event Date: 
Tuesday, July 5, 2016 - 11:15am to 12:15pm

Location: Walter Library, Room 402

The recent resurgence of efficient deep learning platforms has enabled significant advancements on several fundamental problems in computer vision, including human action recognition. Despite these breakthroughs, the problem continues to be challenging in a general setting. In this talk, we look at a difficult subset of this general problem class, namely fine-grained action recognition. In this problem class, actions are characterized by subtle inter-class differences (e.g., washing plates versus washing hands) and strong intra-class appearance variations (e.g., slicing cucumbers versus slicing tomatoes). The problem is also often more complicated by occlusions of objects and human body-parts, and presence of hard to detect tools (such as knives, peelers, etc.). Recognizing such actions is important in several applications, including visual surveillance, human-robot interaction, and elderly health monitoring systems.

In this talk, Cherian will describe a representational approach to this problem. The main idea is to use the co-occurrences of visual features to generate compact tensor representations that can discriminate such actions. Two such representations will be introduced, namely (i) using second-order co-occurrences as captured by the correlations between the predictions of a deep-learned convolutional neural network model, and (ii) using third-order co-occurrences of features extracted from 3D pose-skeleton action sequences. Both these schemes are simple, easy to implement, and our experimental results demonstrate that they achieve state-of-the-art accuracy on several benchmark datasets.