Deep Boltzmann Machines for i-Vector Based Audio-Visual Person Identification

    Research output: Chapter in Book/Conference paperConference paper

    3 Citations (Scopus)

    Abstract

    We propose an approach using DBM-DNNs for i-vector based audio-visual person identification. The unsupervised training of two Deep Boltzmann Machines DBMspeech and DBMface is performed using unlabeled audio and visual data from a set of background subjects. The DBMs are then used to initialize two corresponding DNNs for classification, referred to as the DBM-DNNspeech and DBM-DNNface in this paper. The DBM-DNNs are discriminatively fine-tuned using the back- propagation on a set of training data and evaluated on a set of test data from the target subjects. We compared their performance with the cosine distance (cosDist) and the state-of-the-art DBN-DNN classifier. We also tested three different configurations of the DBM-DNNs. We show that DBM-DNNs with two hidden layers and 800 units in each hidden layer achieved best identification performance for 400 dimensional i-vectors as input. Our experiments were carried out on the challenging MOBIO dataset. © Springer International Publishing Switzerland 2016.
    Original languageEnglish
    Title of host publicationImage and Video Technology, PSIVT 2015
    EditorsThomas Bräunl, Brendan McCane, Mariano Rivera, Xinguo Yu
    Place of PublicationUSA
    PublisherSpringer-Verlag London Ltd.
    Pages631-641
    Number of pages11
    Volume9431
    ISBN (Electronic)9783319294513
    ISBN (Print)9783319294506
    DOIs
    Publication statusPublished - 2016
    Event7th Pacific-Rim Symposium on Image and Video Technology: PSIVT 2015 - Auckland, New Zealand
    Duration: 23 Nov 201527 Nov 2015

    Publication series

    NameLecture Notes in Computer Science

    Conference

    Conference7th Pacific-Rim Symposium on Image and Video Technology
    Abbreviated titlePSIVT 2015
    CountryNew Zealand
    CityAuckland
    Period23/11/1527/11/15

    Fingerprint Dive into the research topics of 'Deep Boltzmann Machines for i-Vector Based Audio-Visual Person Identification'. Together they form a unique fingerprint.

    Cite this