SkeletonNet: Mining Deep Part Features for 3-D Action Recognition

    Research output: Contribution to journalArticle

    38 Citations (Scopus)
    215 Downloads (Pure)

    Abstract

    This letter presents SkeletonNet, a deep learning framework for skeleton-based 3-D action recognition. Given a skeleton sequence, the spatial structure of the skeleton joints in each frame and the temporal information between multiple frames are two important factors for action recognition. We first extract body-part-based features from each frame of the skeleton sequence. Compared to the original coordinates of the skeleton joints, the proposed features are translation, rotation, and scale invariant. To learn robust temporal information, instead of treating the features of all frames as a time series, we transform the features into images and feed them to the proposed deep learning network, which contains two parts: one to extract general features from the input images, while the other to generate a discriminative and compact representation for action recognition. The proposed method is tested on the SBU kinect interaction dataset, the CMU dataset, and the large-scale NTU RGB+D dataset and achieves state-of-the-art performance.
    Original languageEnglish
    Pages (from-to)731-735
    JournalIEEE Signal Processing Letters
    DOIs
    Publication statusPublished - 2017

    Fingerprint

    Action Recognition
    Skeleton
    3D
    Mining
    Time series
    Rotation Invariant
    Scale Invariant
    Spatial Structure
    Deep learning
    Transform
    Interaction

    Cite this

    @article{40d02f557d0247cd8021bdc09de25f10,
    title = "SkeletonNet: Mining Deep Part Features for 3-D Action Recognition",
    abstract = "This letter presents SkeletonNet, a deep learning framework for skeleton-based 3-D action recognition. Given a skeleton sequence, the spatial structure of the skeleton joints in each frame and the temporal information between multiple frames are two important factors for action recognition. We first extract body-part-based features from each frame of the skeleton sequence. Compared to the original coordinates of the skeleton joints, the proposed features are translation, rotation, and scale invariant. To learn robust temporal information, instead of treating the features of all frames as a time series, we transform the features into images and feed them to the proposed deep learning network, which contains two parts: one to extract general features from the input images, while the other to generate a discriminative and compact representation for action recognition. The proposed method is tested on the SBU kinect interaction dataset, the CMU dataset, and the large-scale NTU RGB+D dataset and achieves state-of-the-art performance.",
    author = "Qiuhong Ke and Senjian An and Mohammed Bennamoun and Ferdous Sohel and Farid Boussaid",
    year = "2017",
    doi = "10.1109/LSP.2017.2690339",
    language = "English",
    pages = "731--735",
    journal = "IEEE Signal Processing Letters",
    issn = "1070-9908",
    publisher = "IEEE, Institute of Electrical and Electronics Engineers",

    }

    SkeletonNet: Mining Deep Part Features for 3-D Action Recognition. / Ke, Qiuhong; An, Senjian; Bennamoun, Mohammed; Sohel, Ferdous; Boussaid, Farid.

    In: IEEE Signal Processing Letters, 2017, p. 731-735.

    Research output: Contribution to journalArticle

    TY - JOUR

    T1 - SkeletonNet: Mining Deep Part Features for 3-D Action Recognition

    AU - Ke, Qiuhong

    AU - An, Senjian

    AU - Bennamoun, Mohammed

    AU - Sohel, Ferdous

    AU - Boussaid, Farid

    PY - 2017

    Y1 - 2017

    N2 - This letter presents SkeletonNet, a deep learning framework for skeleton-based 3-D action recognition. Given a skeleton sequence, the spatial structure of the skeleton joints in each frame and the temporal information between multiple frames are two important factors for action recognition. We first extract body-part-based features from each frame of the skeleton sequence. Compared to the original coordinates of the skeleton joints, the proposed features are translation, rotation, and scale invariant. To learn robust temporal information, instead of treating the features of all frames as a time series, we transform the features into images and feed them to the proposed deep learning network, which contains two parts: one to extract general features from the input images, while the other to generate a discriminative and compact representation for action recognition. The proposed method is tested on the SBU kinect interaction dataset, the CMU dataset, and the large-scale NTU RGB+D dataset and achieves state-of-the-art performance.

    AB - This letter presents SkeletonNet, a deep learning framework for skeleton-based 3-D action recognition. Given a skeleton sequence, the spatial structure of the skeleton joints in each frame and the temporal information between multiple frames are two important factors for action recognition. We first extract body-part-based features from each frame of the skeleton sequence. Compared to the original coordinates of the skeleton joints, the proposed features are translation, rotation, and scale invariant. To learn robust temporal information, instead of treating the features of all frames as a time series, we transform the features into images and feed them to the proposed deep learning network, which contains two parts: one to extract general features from the input images, while the other to generate a discriminative and compact representation for action recognition. The proposed method is tested on the SBU kinect interaction dataset, the CMU dataset, and the large-scale NTU RGB+D dataset and achieves state-of-the-art performance.

    U2 - 10.1109/LSP.2017.2690339

    DO - 10.1109/LSP.2017.2690339

    M3 - Article

    SP - 731

    EP - 735

    JO - IEEE Signal Processing Letters

    JF - IEEE Signal Processing Letters

    SN - 1070-9908

    ER -