Abstract
This paper presents a novel Lip synchronization technique which investigates the correlation between the speech and lips movements. First, the speech signal is represented as a nonlinear time-varying model which involves a sum of AM–FM signals. Each of these signals is employed to model a single Formant frequency. The model is realized using Taylor series expansion in a way which provides the relationship between the lip shape (width and height) w.r.t. the speech amplitude and instantaneous frequency. Using lips width and height, a semi-speech signal is generated and correlated with the original speech signal over a span of delays then the delay between the speech and the video is estimated. Using real and noisy data from the VidTimit and in-house diastases, the proposed method was able to estimate small delays of 0.01–0.1 s in the case of noise-less and noisy signals respectively with a maximum absolute error of 0.0022 s.
| Original language | English |
|---|---|
| Pages (from-to) | 780 - 786 |
| Journal | Pattern Recognition Letters |
| Volume | 32 |
| Issue number | 6 |
| Early online date | 9 Jan 2011 |
| DOIs | |
| Publication status | Published - Apr 2011 |
Fingerprint
Dive into the research topics of 'Correlation based speech-video synchronization'. Together they form a unique fingerprint.Projects
- 1 Finished
-
Integration of Spatiotemporal Video Data for Realtime Smart Proactive Surveillance
Mian, A. (Chief Investigator)
1/01/08 → 31/12/10
Project: Research
Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver