A joint Deep Boltzmann Machine (jDBM) Model for Person Identification using Mobile Phone Data

    Research output: Contribution to journalArticle

    7 Citations (Scopus)

    Abstract

    We propose an audio-visual person identification approach based on a joint deep Boltzmann machine (jDBM) model. The proposed jDBM model is trained in three steps: a) learning the unimodal DBM models corresponding to the speech and facial image modalities, b) learning the shared layer parameters using a joint Restricted Boltzmann Machine (jRBM) model and c) the fine-tuning of the jDBM model after the initialization with the parameters of the unimodal DBMs and the shared layer. The activation probabilities of the units of the shared layer are used as the joint features and a logistic regression classifier is used for the combined speech and facial image recognition. We show that by learning the shared layer parameters using a jRBM, a higher accuracy can be achieved compared to the greedy layer-wise initialization. The performance of our proposed model is also compared with state-of-the art support vector machine (SVM), deep belief network (DBN), and the deep auto-encoder (DAE) models. In addition, our experimental results show that the joint representations obtained from the proposed jDBM model are robust to noise and missing information. Experiments were carried out on the challenging MOBIO database, which includes audio-visual data captured using mobile phones.

    Original languageEnglish
    Pages (from-to)317-326
    JournalIEEE Transactions on Multimedia
    Volume19
    Issue number2
    DOIs
    Publication statusPublished - 5 Oct 2016

    Fingerprint

    Mobile phones
    Identification (control systems)
    Image recognition
    Bayesian networks
    Support vector machines
    Logistics
    Classifiers
    Tuning
    Chemical activation

    Cite this

    @article{c5e1fe79c466401c8ef274ae92d4576e,
    title = "A joint Deep Boltzmann Machine (jDBM) Model for Person Identification using Mobile Phone Data",
    abstract = "We propose an audio-visual person identification approach based on a joint deep Boltzmann machine (jDBM) model. The proposed jDBM model is trained in three steps: a) learning the unimodal DBM models corresponding to the speech and facial image modalities, b) learning the shared layer parameters using a joint Restricted Boltzmann Machine (jRBM) model and c) the fine-tuning of the jDBM model after the initialization with the parameters of the unimodal DBMs and the shared layer. The activation probabilities of the units of the shared layer are used as the joint features and a logistic regression classifier is used for the combined speech and facial image recognition. We show that by learning the shared layer parameters using a jRBM, a higher accuracy can be achieved compared to the greedy layer-wise initialization. The performance of our proposed model is also compared with state-of-the art support vector machine (SVM), deep belief network (DBN), and the deep auto-encoder (DAE) models. In addition, our experimental results show that the joint representations obtained from the proposed jDBM model are robust to noise and missing information. Experiments were carried out on the challenging MOBIO database, which includes audio-visual data captured using mobile phones.",
    keywords = "Audio-visual biometrics, Deep Boltzmann Machines, Joint features",
    author = "Mohammad Alam and Mohammed Bennamoun and Roberto Togneri and Ferdous Sohel",
    year = "2016",
    month = "10",
    day = "5",
    doi = "10.1109/TMM.2016.2615524",
    language = "English",
    volume = "19",
    pages = "317--326",
    journal = "IEEE Transactions on Multimedia",
    issn = "1520-9210",
    publisher = "IEEE, Institute of Electrical and Electronics Engineers",
    number = "2",

    }

    TY - JOUR

    T1 - A joint Deep Boltzmann Machine (jDBM) Model for Person Identification using Mobile Phone Data

    AU - Alam, Mohammad

    AU - Bennamoun, Mohammed

    AU - Togneri, Roberto

    AU - Sohel, Ferdous

    PY - 2016/10/5

    Y1 - 2016/10/5

    N2 - We propose an audio-visual person identification approach based on a joint deep Boltzmann machine (jDBM) model. The proposed jDBM model is trained in three steps: a) learning the unimodal DBM models corresponding to the speech and facial image modalities, b) learning the shared layer parameters using a joint Restricted Boltzmann Machine (jRBM) model and c) the fine-tuning of the jDBM model after the initialization with the parameters of the unimodal DBMs and the shared layer. The activation probabilities of the units of the shared layer are used as the joint features and a logistic regression classifier is used for the combined speech and facial image recognition. We show that by learning the shared layer parameters using a jRBM, a higher accuracy can be achieved compared to the greedy layer-wise initialization. The performance of our proposed model is also compared with state-of-the art support vector machine (SVM), deep belief network (DBN), and the deep auto-encoder (DAE) models. In addition, our experimental results show that the joint representations obtained from the proposed jDBM model are robust to noise and missing information. Experiments were carried out on the challenging MOBIO database, which includes audio-visual data captured using mobile phones.

    AB - We propose an audio-visual person identification approach based on a joint deep Boltzmann machine (jDBM) model. The proposed jDBM model is trained in three steps: a) learning the unimodal DBM models corresponding to the speech and facial image modalities, b) learning the shared layer parameters using a joint Restricted Boltzmann Machine (jRBM) model and c) the fine-tuning of the jDBM model after the initialization with the parameters of the unimodal DBMs and the shared layer. The activation probabilities of the units of the shared layer are used as the joint features and a logistic regression classifier is used for the combined speech and facial image recognition. We show that by learning the shared layer parameters using a jRBM, a higher accuracy can be achieved compared to the greedy layer-wise initialization. The performance of our proposed model is also compared with state-of-the art support vector machine (SVM), deep belief network (DBN), and the deep auto-encoder (DAE) models. In addition, our experimental results show that the joint representations obtained from the proposed jDBM model are robust to noise and missing information. Experiments were carried out on the challenging MOBIO database, which includes audio-visual data captured using mobile phones.

    KW - Audio-visual biometrics

    KW - Deep Boltzmann Machines

    KW - Joint features

    UR - http://www.scopus.com/inward/record.url?scp=84991662094&partnerID=8YFLogxK

    U2 - 10.1109/TMM.2016.2615524

    DO - 10.1109/TMM.2016.2615524

    M3 - Article

    VL - 19

    SP - 317

    EP - 326

    JO - IEEE Transactions on Multimedia

    JF - IEEE Transactions on Multimedia

    SN - 1520-9210

    IS - 2

    ER -