Speech-video synchronization using lips movements and speech envelope correlation

Amar El-Sallam Abd, Ajmal Mian, M. Kamel (Editor), A. Campilho (Editor)

    Research output: Contribution to journalArticlepeer-review

    1 Citation (Scopus)


    In this paper, we propose a novel correlation based methodfor speech-video synchronization (synch) and relationship classification.The method uses the envelope of the speech signal and data extractedfrom the lips movement. Firstly, a nonlinear-time-varying model is consideredto represent the speech signal as a sum of amplitude and frequencymodulated (AM-FM) signals. Each AM-FM signal, in this sum,is considered to model a single speech formant frequency. Using Taylorseries expansion, the model is formulated in a way which characterizes therelation between the speech amplitude and the instantaneous frequencyof each AM-FM signal w.r.t lips movements. Secondly, the envelope ofthe speech signal is estimated and then correlated with signals generatedfrom lips movement. From the resultant correlation, the relation betweenthe two signals is classified and the delay between them is estimated. Theproposed method is applied to real cases and the results show that it isable to (i) classify if the speech and the video signals belong to the samesource, (ii) estimate delays between audio and video signals that are assmall as 0.1 second when speech signals are noisy and 0.04 second whenthe additive noise is less significant.
    Original languageEnglish
    Pages (from-to)397-407
    JournalLecture Notes in Computer Science
    Publication statusPublished - 2009


    Dive into the research topics of 'Speech-video synchronization using lips movements and speech envelope correlation'. Together they form a unique fingerprint.

    Cite this