Modeling Sub-Event Dynamics in First-Person Action Recognition

Hasan Firdaus Bin Mohd Zaki, Faisal Shafait, Ajmal Saeed Mian

Research output: Chapter in Book/Conference paperConference paper

11 Citations (Scopus)

Abstract

First-person videos have unique characteristics such as heavy egocentric motion, strong preceding events, salient transitional activities and post-event impacts. Action recognition methods designed for third person videos may not optimally represent actions captured by first-person videos. We propose a method to represent the high level dynamics of sub-events in first-person videos by dynamically pooling features of sub-intervals of time series using a temporal
feature pooling function. The sub-event dynamics are then temporally aligned to make a new series. To keep track of how the sub-event dynamics evolve over time, we recursively employ the Fast Fourier Transform on a pyramidal temporal structure. The Fourier coefficients of the segment define the overall video representation. We perform experiments on two existing benchmark first-person video datasets which have been captured in a controlled environment. Addressing this gap, we introduce a new dataset collected from YouTube which has a larger number of classes and a greater diversity of capture conditions thereby more closely depicting real-world challenges in first-person video analysis. We compare our method to state-of-the-art first person and generic video recognition algorithms. Our method consistently outperforms the nearest competitors by 10.3%, 3.3% and 11.7% respectively on the three datasets.
Original languageEnglish
Title of host publicationProceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017
Place of PublicationUnited States
PublisherIEEE, Institute of Electrical and Electronics Engineers
Pages1619-1628
ISBN (Print)9781538604571
Publication statusPublished - 2017
Event30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017 - Honolulu, United States
Duration: 21 Jul 201726 Jul 2017

Conference

Conference30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017
CountryUnited States
CityHonolulu
Period21/07/1726/07/17

Fingerprint

Fast Fourier transforms
Time series
Experiments

Cite this

Mohd Zaki, H. F. B., Shafait, F., & Mian, A. S. (2017). Modeling Sub-Event Dynamics in First-Person Action Recognition. In Proceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017 (pp. 1619-1628). United States: IEEE, Institute of Electrical and Electronics Engineers.
Mohd Zaki, Hasan Firdaus Bin ; Shafait, Faisal ; Mian, Ajmal Saeed. / Modeling Sub-Event Dynamics in First-Person Action Recognition. Proceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017. United States : IEEE, Institute of Electrical and Electronics Engineers, 2017. pp. 1619-1628
@inproceedings{39107f81ec504efd8245c45838bb5bbd,
title = "Modeling Sub-Event Dynamics in First-Person Action Recognition",
abstract = "First-person videos have unique characteristics such as heavy egocentric motion, strong preceding events, salient transitional activities and post-event impacts. Action recognition methods designed for third person videos may not optimally represent actions captured by first-person videos. We propose a method to represent the high level dynamics of sub-events in first-person videos by dynamically pooling features of sub-intervals of time series using a temporalfeature pooling function. The sub-event dynamics are then temporally aligned to make a new series. To keep track of how the sub-event dynamics evolve over time, we recursively employ the Fast Fourier Transform on a pyramidal temporal structure. The Fourier coefficients of the segment define the overall video representation. We perform experiments on two existing benchmark first-person video datasets which have been captured in a controlled environment. Addressing this gap, we introduce a new dataset collected from YouTube which has a larger number of classes and a greater diversity of capture conditions thereby more closely depicting real-world challenges in first-person video analysis. We compare our method to state-of-the-art first person and generic video recognition algorithms. Our method consistently outperforms the nearest competitors by 10.3{\%}, 3.3{\%} and 11.7{\%} respectively on the three datasets.",
author = "{Mohd Zaki}, {Hasan Firdaus Bin} and Faisal Shafait and Mian, {Ajmal Saeed}",
year = "2017",
language = "English",
isbn = "9781538604571",
pages = "1619--1628",
booktitle = "Proceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017",
publisher = "IEEE, Institute of Electrical and Electronics Engineers",
address = "United States",

}

Mohd Zaki, HFB, Shafait, F & Mian, AS 2017, Modeling Sub-Event Dynamics in First-Person Action Recognition. in Proceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017. IEEE, Institute of Electrical and Electronics Engineers, United States, pp. 1619-1628, 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, United States, 21/07/17.

Modeling Sub-Event Dynamics in First-Person Action Recognition. / Mohd Zaki, Hasan Firdaus Bin; Shafait, Faisal; Mian, Ajmal Saeed.

Proceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017. United States : IEEE, Institute of Electrical and Electronics Engineers, 2017. p. 1619-1628.

Research output: Chapter in Book/Conference paperConference paper

TY - GEN

T1 - Modeling Sub-Event Dynamics in First-Person Action Recognition

AU - Mohd Zaki, Hasan Firdaus Bin

AU - Shafait, Faisal

AU - Mian, Ajmal Saeed

PY - 2017

Y1 - 2017

N2 - First-person videos have unique characteristics such as heavy egocentric motion, strong preceding events, salient transitional activities and post-event impacts. Action recognition methods designed for third person videos may not optimally represent actions captured by first-person videos. We propose a method to represent the high level dynamics of sub-events in first-person videos by dynamically pooling features of sub-intervals of time series using a temporalfeature pooling function. The sub-event dynamics are then temporally aligned to make a new series. To keep track of how the sub-event dynamics evolve over time, we recursively employ the Fast Fourier Transform on a pyramidal temporal structure. The Fourier coefficients of the segment define the overall video representation. We perform experiments on two existing benchmark first-person video datasets which have been captured in a controlled environment. Addressing this gap, we introduce a new dataset collected from YouTube which has a larger number of classes and a greater diversity of capture conditions thereby more closely depicting real-world challenges in first-person video analysis. We compare our method to state-of-the-art first person and generic video recognition algorithms. Our method consistently outperforms the nearest competitors by 10.3%, 3.3% and 11.7% respectively on the three datasets.

AB - First-person videos have unique characteristics such as heavy egocentric motion, strong preceding events, salient transitional activities and post-event impacts. Action recognition methods designed for third person videos may not optimally represent actions captured by first-person videos. We propose a method to represent the high level dynamics of sub-events in first-person videos by dynamically pooling features of sub-intervals of time series using a temporalfeature pooling function. The sub-event dynamics are then temporally aligned to make a new series. To keep track of how the sub-event dynamics evolve over time, we recursively employ the Fast Fourier Transform on a pyramidal temporal structure. The Fourier coefficients of the segment define the overall video representation. We perform experiments on two existing benchmark first-person video datasets which have been captured in a controlled environment. Addressing this gap, we introduce a new dataset collected from YouTube which has a larger number of classes and a greater diversity of capture conditions thereby more closely depicting real-world challenges in first-person video analysis. We compare our method to state-of-the-art first person and generic video recognition algorithms. Our method consistently outperforms the nearest competitors by 10.3%, 3.3% and 11.7% respectively on the three datasets.

M3 - Conference paper

SN - 9781538604571

SP - 1619

EP - 1628

BT - Proceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017

PB - IEEE, Institute of Electrical and Electronics Engineers

CY - United States

ER -

Mohd Zaki HFB, Shafait F, Mian AS. Modeling Sub-Event Dynamics in First-Person Action Recognition. In Proceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017. United States: IEEE, Institute of Electrical and Electronics Engineers. 2017. p. 1619-1628