Sample-Based Talking-Head Synthesis
Video Examples:
All videos have been produced automatically either from synthesized speech
(TTS) or from labeled recorded audio. In most cases, eye animations and
head movements are from a recorded sequence and only the mouth has been
synthesized.
These examples are meant to illustrate some aspects of the thesis and are
organized following the chapters. Extra examples are given at the end.
Chapter 1
Chapter 2
- Illustration of facial feature location (mpeg1
video-only 320KB)
- top-left: STEP1 low-level analysis (band-pass filter, morphological
filter)
- bottom-left: STEP2 shape analysis (adaptive thresholding, connected
components, n-gram search)
- top-right: STEP3: color analysis (whole face, combined with shape
analysis)
- bottom-right: STEP4: color analysis (lips, nostrils)
- Illustration of pose estimation (mpeg1 video-only
352KB)
- 6 features used for pose estimation are overlaid (4 eye corners and
nostrils)
- result of pose estimation shown with animated 3D model (top-right)
Chapter 3
- Effect of the visual cost in unit selection (mpeg1
video-only 612KB)
(two animations side by side from the same target text)
- left, without visual cost (without a cost for visual smoothness, the graph
search returns jerky animations)
- right, with
visual cost (the use of a visual cost in the search results in a smoother
animation)
Chapter 4
Songs
(Lyrics have been phonetically annotated manually)
Example of prompts with recorded audio
(prompts have been phonetically annotated automatically using an aligner)
Example of prompts with synthesized audio (TTS)