Deep Boltzmann Machines for i-Vector Based Audio-Visual Person Identification

    Research output: Chapter in Book/Conference paperConference paper

    3 Citations (Scopus)

    Abstract

    We propose an approach using DBM-DNNs for i-vector based audio-visual person identification. The unsupervised training of two Deep Boltzmann Machines DBMspeech and DBMface is performed using unlabeled audio and visual data from a set of background subjects. The DBMs are then used to initialize two corresponding DNNs for classification, referred to as the DBM-DNNspeech and DBM-DNNface in this paper. The DBM-DNNs are discriminatively fine-tuned using the back- propagation on a set of training data and evaluated on a set of test data from the target subjects. We compared their performance with the cosine distance (cosDist) and the state-of-the-art DBN-DNN classifier. We also tested three different configurations of the DBM-DNNs. We show that DBM-DNNs with two hidden layers and 800 units in each hidden layer achieved best identification performance for 400 dimensional i-vectors as input. Our experiments were carried out on the challenging MOBIO dataset. © Springer International Publishing Switzerland 2016.
    Original languageEnglish
    Title of host publicationImage and Video Technology, PSIVT 2015
    EditorsThomas Bräunl, Brendan McCane, Mariano Rivera, Xinguo Yu
    Place of PublicationUSA
    PublisherSpringer-Verlag London Ltd.
    Pages631-641
    Number of pages11
    Volume9431
    ISBN (Electronic)9783319294513
    ISBN (Print)9783319294506
    DOIs
    Publication statusPublished - 2016
    Event7th Pacific-Rim Symposium on Image and Video Technology: PSIVT 2015 - Auckland, New Zealand
    Duration: 23 Nov 201527 Nov 2015

    Publication series

    NameLecture Notes in Computer Science

    Conference

    Conference7th Pacific-Rim Symposium on Image and Video Technology
    Abbreviated titlePSIVT 2015
    CountryNew Zealand
    CityAuckland
    Period23/11/1527/11/15

    Fingerprint

    Backpropagation
    Classifiers
    Experiments

    Cite this

    Alam, M. R., Bennamoun, M., Togneri, R., & Sohel, F. (2016). Deep Boltzmann Machines for i-Vector Based Audio-Visual Person Identification. In T. Bräunl, B. McCane, M. Rivera, & X. Yu (Eds.), Image and Video Technology, PSIVT 2015 (Vol. 9431, pp. 631-641). (Lecture Notes in Computer Science). USA: Springer-Verlag London Ltd.. https://doi.org/10.1007/978-3-319-29451-3_50
    Alam, M.R. ; Bennamoun, Mohammed ; Togneri, Roberto ; Sohel, Ferdous. / Deep Boltzmann Machines for i-Vector Based Audio-Visual Person Identification. Image and Video Technology, PSIVT 2015. editor / Thomas Bräunl ; Brendan McCane ; Mariano Rivera ; Xinguo Yu. Vol. 9431 USA : Springer-Verlag London Ltd., 2016. pp. 631-641 (Lecture Notes in Computer Science).
    @inproceedings{098b2a897a5a4d7ba6cbdd631fb77103,
    title = "Deep Boltzmann Machines for i-Vector Based Audio-Visual Person Identification",
    abstract = "We propose an approach using DBM-DNNs for i-vector based audio-visual person identification. The unsupervised training of two Deep Boltzmann Machines DBMspeech and DBMface is performed using unlabeled audio and visual data from a set of background subjects. The DBMs are then used to initialize two corresponding DNNs for classification, referred to as the DBM-DNNspeech and DBM-DNNface in this paper. The DBM-DNNs are discriminatively fine-tuned using the back- propagation on a set of training data and evaluated on a set of test data from the target subjects. We compared their performance with the cosine distance (cosDist) and the state-of-the-art DBN-DNN classifier. We also tested three different configurations of the DBM-DNNs. We show that DBM-DNNs with two hidden layers and 800 units in each hidden layer achieved best identification performance for 400 dimensional i-vectors as input. Our experiments were carried out on the challenging MOBIO dataset. {\circledC} Springer International Publishing Switzerland 2016.",
    author = "M.R. Alam and Mohammed Bennamoun and Roberto Togneri and Ferdous Sohel",
    year = "2016",
    doi = "10.1007/978-3-319-29451-3_50",
    language = "English",
    isbn = "9783319294506",
    volume = "9431",
    series = "Lecture Notes in Computer Science",
    publisher = "Springer-Verlag London Ltd.",
    pages = "631--641",
    editor = "Br{\"a}unl, {Thomas } and McCane, {Brendan } and Rivera, {Mariano } and Yu, {Xinguo }",
    booktitle = "Image and Video Technology, PSIVT 2015",
    address = "Germany",

    }

    Alam, MR, Bennamoun, M, Togneri, R & Sohel, F 2016, Deep Boltzmann Machines for i-Vector Based Audio-Visual Person Identification. in T Bräunl, B McCane, M Rivera & X Yu (eds), Image and Video Technology, PSIVT 2015. vol. 9431, Lecture Notes in Computer Science, Springer-Verlag London Ltd., USA, pp. 631-641, 7th Pacific-Rim Symposium on Image and Video Technology, Auckland, New Zealand, 23/11/15. https://doi.org/10.1007/978-3-319-29451-3_50

    Deep Boltzmann Machines for i-Vector Based Audio-Visual Person Identification. / Alam, M.R. ; Bennamoun, Mohammed; Togneri, Roberto; Sohel, Ferdous.

    Image and Video Technology, PSIVT 2015. ed. / Thomas Bräunl; Brendan McCane; Mariano Rivera; Xinguo Yu. Vol. 9431 USA : Springer-Verlag London Ltd., 2016. p. 631-641 (Lecture Notes in Computer Science).

    Research output: Chapter in Book/Conference paperConference paper

    TY - GEN

    T1 - Deep Boltzmann Machines for i-Vector Based Audio-Visual Person Identification

    AU - Alam, M.R.

    AU - Bennamoun, Mohammed

    AU - Togneri, Roberto

    AU - Sohel, Ferdous

    PY - 2016

    Y1 - 2016

    N2 - We propose an approach using DBM-DNNs for i-vector based audio-visual person identification. The unsupervised training of two Deep Boltzmann Machines DBMspeech and DBMface is performed using unlabeled audio and visual data from a set of background subjects. The DBMs are then used to initialize two corresponding DNNs for classification, referred to as the DBM-DNNspeech and DBM-DNNface in this paper. The DBM-DNNs are discriminatively fine-tuned using the back- propagation on a set of training data and evaluated on a set of test data from the target subjects. We compared their performance with the cosine distance (cosDist) and the state-of-the-art DBN-DNN classifier. We also tested three different configurations of the DBM-DNNs. We show that DBM-DNNs with two hidden layers and 800 units in each hidden layer achieved best identification performance for 400 dimensional i-vectors as input. Our experiments were carried out on the challenging MOBIO dataset. © Springer International Publishing Switzerland 2016.

    AB - We propose an approach using DBM-DNNs for i-vector based audio-visual person identification. The unsupervised training of two Deep Boltzmann Machines DBMspeech and DBMface is performed using unlabeled audio and visual data from a set of background subjects. The DBMs are then used to initialize two corresponding DNNs for classification, referred to as the DBM-DNNspeech and DBM-DNNface in this paper. The DBM-DNNs are discriminatively fine-tuned using the back- propagation on a set of training data and evaluated on a set of test data from the target subjects. We compared their performance with the cosine distance (cosDist) and the state-of-the-art DBN-DNN classifier. We also tested three different configurations of the DBM-DNNs. We show that DBM-DNNs with two hidden layers and 800 units in each hidden layer achieved best identification performance for 400 dimensional i-vectors as input. Our experiments were carried out on the challenging MOBIO dataset. © Springer International Publishing Switzerland 2016.

    U2 - 10.1007/978-3-319-29451-3_50

    DO - 10.1007/978-3-319-29451-3_50

    M3 - Conference paper

    SN - 9783319294506

    VL - 9431

    T3 - Lecture Notes in Computer Science

    SP - 631

    EP - 641

    BT - Image and Video Technology, PSIVT 2015

    A2 - Bräunl, Thomas

    A2 - McCane, Brendan

    A2 - Rivera, Mariano

    A2 - Yu, Xinguo

    PB - Springer-Verlag London Ltd.

    CY - USA

    ER -

    Alam MR, Bennamoun M, Togneri R, Sohel F. Deep Boltzmann Machines for i-Vector Based Audio-Visual Person Identification. In Bräunl T, McCane B, Rivera M, Yu X, editors, Image and Video Technology, PSIVT 2015. Vol. 9431. USA: Springer-Verlag London Ltd. 2016. p. 631-641. (Lecture Notes in Computer Science). https://doi.org/10.1007/978-3-319-29451-3_50