Sample

Sample-Based Talking-Head Synthesis

Video Examples:

All videos have been produced automatically either from synthesized speech
(TTS) or from labeled recorded audio. In most cases, eye animations and
head movements are from a recorded sequence and only the mouth has been
synthesized.

These examples are meant to illustrate some aspects of the thesis and are
organized following the chapters. Extra examples are given at the end.

Chapter 1

Example of an e-commerce session (mpeg1 audio-video 15MB)

Chapter 2

Illustration of facial feature location (mpeg1 video-only 320KB)
- top-left: STEP1 low-level analysis (band-pass filter, morphological filter)
- bottom-left: STEP2 shape analysis (adaptive thresholding, connected components, n-gram search)
- top-right: STEP3: color analysis (whole face, combined with shape analysis)
- bottom-right: STEP4: color analysis (lips, nostrils)
Illustration of pose estimation (mpeg1 video-only 352KB)
- 6 features used for pose estimation are overlaid (4 eye corners and nostrils)
- result of pose estimation shown with animated 3D model (top-right)

Chapter 3

Effect of the visual cost in unit selection (mpeg1 video-only 612KB)
(two animations side by side from the same target text)
- left, without visual cost (without a cost for visual smoothness, the graph search returns jerky animations)
- right, with visual cost (the use of a visual cost in the search results in a smoother animation)

Chapter 4

Expressions insertion:
- little smile (mpeg1 audio-video 552KB),
- wide smile (mpeg1 audio-video 581KB),
- facial shrug (mpeg1 audio-video 511KB),
- lips licking (mpeg1 audio-video 684KB)
Visual Prosody:
- with head movements uncorrelated to the speech (mpeg1 audio-video 463KB),
- with head movements synchronized with speech (mpeg1 audio-video 463KB),
- recorded original (mpeg1 audio-video 370KB)
Automatic Newscaster Example (mpeg1 audio-video 5MB)

Songs

(Lyrics have been phonetically annotated manually)

"Au clair de la lune..." (mpeg1 audio-video 2.2MB)
"...par la barbichette..." (mpeg1 audio-video 1.8MB)

Example of prompts with recorded audio

(prompts have been phonetically annotated automatically using an aligner)

"Good morning Bob, you have twelve new email messages." (mpeg1 audio-video 2.8MB)
"Texas Instuments, a unit of..." (mpeg1 audio-video 1.4MB)
"Japan domestic sales of..." (mpeg1 audio-video 2.3MB)

Example of prompts with synthesized audio (TTS)

"United Nations..." (mpeg1 audio-video 1.5MB)
"I would like to say thank you..." (mpeg1 audio-video 1.5MB)