Work Area: Speech and Natural Language
Keywords speech inverse acoustics, speech production, speech audiovisual integration, speech robotics
Start Date: 1 September 92 / Duration: 36 months / Status: running
[ participants / contact ]
Abstract SPEECH MAPS aims to answer, both theoretically and technologically, a basic question in speech inverse acoustics: Can an articulatory robot learn to produce articulatory gestures from sounds? The robotics approach allows the mapping of action and perception in speech. The building of a dedicated learning architecture, Articulotron , incorporating an audiovisual perceptron as a front end, has been undertaken to bring a decisive advance towards solving the speech inverse problem. This could lead to major spinoffs for synthesis and recognition applications.
Inverse mapping from speech sounds to articulatory gestures is a difficult problem, primarily because of the nonlinear, many-to-one, relationship of articulation to acoustics. So far, it has been an ill-posed problem, in the mathematical sense. Due to recent outstanding progress in robotics, it is now possible to answer, both theoretically and technologically, a basic question in speech inverse acoustics: Can an articulatory robot learn to produce articulatory gestures from sounds?
One can conceive of two complementary approaches to the speech inversion problem. The first uses all the knowledge in signal processing to identify the characteristics of the sources and filters corresponding to the vocal tract which produced the speech signal. The second is borrowed from control theory, and aims at determining inverse kinematics and/or dynamics for an articulatory robot with excess degrees of freedom. In both approaches, there is a clear need of knowledge of direct mapping (from articulation to acoustics), to find constraints in order to regularise the solution.
Following basic schemes in robotics, the speech production model is represented here by a realistic articulatory model, the plant, driven by a controller, ie a sequential network capable of synthesising motor sequences from sound prototypes. This ensemble, called Articulotron, displays fundamental spatio-temporal properties of serial ordering in speech (coarticulation phenomena) and adaptative behaviour to compensate for perturbations.
The robotics approach for speech allows the unification of Action and Perception. If speech communication is conceived of as a trade-off between the cost of production and the benefit of understanding, the constraints will be borrowed from the articulatory level, and the specific low level processing from auditory, and visual perception. Using an Audiovisual Perceptron to incorporate vision will lead to a more comprehensive formulation of the inversion problem: How can articulatory gestures be learned from hearing and seeing speech?
Available deliverables cover the main four areas of research in the project, ie sources and vocal tract modelling, motor control, and audio and visual processing:
The integrated approach propounded in this project should lead (together with the Articulotron, the Audiovisual Perceptron and other tools for speech processing) to major "spinoffs" in R&D. Speech synthesis will greatly benefit from the learning ability of a robot taking advantage of adaptative biological principles. Low bit-rate transmission of speech can also be developed from this approach, through access to articulatory codebooks. Finally, speech recognition using the enhancement by vision of the acoustic signal in noise would also benefit from this low level inverse mapping.
INPG - Université de Stendhal - F
Insitut de la Communication Parlée
URA-CNRS 368, B.P. 25
F - 38040 GRENOBLE CEDEX 09
University of Leeds - UK
Telecom Paris/Arecom - F
Institut Estudis Catalans - E
KTH - S
Université de Lausanne - CH
Universität Köln - G
Université de Strasbourg II - F
University of Southampton - UK
Dublin City University - IRL
Trinity College Dublin - IRL
Università di Genova - I
University of Lund - S
Dr. C. Abry and P. Badin
tel +33/76 82 43 37 and 76 57 48 26
fax +33/76 82 43 35 and 76 57 47 10
SPEECH MAPS - 6975, August 1994
please address enquiries to the ESPRIT Information Desk
html version of synopsis by Nick Cook