Les lèvres parlantes / IMIMA
Image(s): 640*480
Jpeg Image (18 Ko) |
- Institut de la Communication Parlée - CNRS -
Université Stendhal
Project : Synthèse de visages parlants
URL : http://ophale.icp.grenet.fr/synth.html
Video(s) and extracted images: 320*240
Film 1 |
Video QuickTime -> |
(2.2 Mo) |
Jpeg Images -> |
(13 Ko) |
Film 2 |
Video QuickTime -> |
(3.0 Mo) |
Jpeg Images -> |
(11 Ko) |
(8 Ko) |
Méthode d'analyse - synthèse de lèvres et visages
Keys Words :
- Lip-sync automatique
- Analyse-synthèse de visages parlants
- parole bimodale
Technical Information
Analyse d'image, synthèse d'image, synchronisation audio,
naturel-visage synthétique
More Information...
Bibliography :
cf Actes Imagina 1994, p144-163 : Perception, synthèse et
analyse des lèvres parlantes
Imagina's Proceedings:"Perception, Analysis and Synthesis of Talking Lips"
Authors: Christian Benoit, Ali Adjoudani, Omar Angola, Thierry Guiard-Marigny,
& Bertrand Le Goff , pp 144-163, 1994
Abstract :
A virtual actor can only aspire to "anthropomorphic quality" if his/her lip movements
in particular, and facial movements in general,
are coherent with the acoustic message supposedly being produced. Indeed, the auditive
modality dominates in perception of the spoken
word for those with normal hearing, but the visual modality enhances understanding of
the spoken word. Although visual information
supplied by movements of the lips, chin, teeth, cheeks, etc. is in itself insufficient
to render the spoken word intelligible,
sight of the speaker's face allows the "restoration", via natural compensation, of a large
part of the oral information which is missing
under degraded acoustic transmission conditions. We quantified the increased intelligibility
provided by visual information when the spoken
word is degraded by noise. Our test conditions included natural or synthetic spoken words
synchronized to match a natural face, or different
parts of a synthetic face (3D lip models from the Institut de la Communication Parlée,
and the Parke face). The most characteristic anatomic
and geometric parameters involved in production and perception of the visual word were
identified through multidimensional analysis of a very
large corpus of spoken French. Software designed for automatic extraction of these parameters
was set up on an image capture and processing post.
In parallel, a parametric model of high resolution three dimensional lips was elaborated,
then set up on a computer graphics post. A control
interface was likewise developed, to allow activation of the articulatory commands of the
Parke facial model (1974), as modified by Cohen (1993),
from automatic measurements performed on the speaker's face. These two models were evaluated
in terms of intelligibility conferred on the naturally
degraded spoken word. Thus, our synthetic lip model, devoid of teeth, tongue or jaw, and
accomplishing movements controlled by a flow-rate of just a
few bits per second, allows transmission of more than one third of the information provided
by viewing the natural face of the reference speaker (i.e. a model ). Finally, a full chain
of analysis/synthesis of the speaking face was developed at the Institut de la Communication
Parlée, allowing real-time visual speech cloning between two distant machines. A system
of this kind is particularly suitable for real-time visual animation of characters, providing
high quality synchronization of labial movements. A real-time demonstration resulting from a
collaboration between Medialab (Paris) and the Institut de la Communication Parlée
(Grenoble) will also present the ICP analysis system, allowing remote control of a synthetic
face developed by Medialab.
Some external links :
- Publications en analyse/synthèse de parole, acoustique et
- http://ophale.icp.grenet.fr/publis.html
Some more Comments :
ces informations proviennent d'un fax de C.Benoît.