A Neural Network Based, Speaker-Independent, Large Vocabulary, Continuous Speech Recognition System


Work Area: Speech and Natural Language

Keywords speech recognition, hidden Markov models, artificial neural networks, hybrid models, speaker adaptation, array processing

Start Date: 1 October 92 / Duration: 36 months / Status: running

[ participants / contact ]

Abstract WERNICKE is exploiting hybrid structures consisting of combinations of hidden Markov models (HMMs) and artificial neural Networks (ANNs) to improve the state-of-the-art in large vocabulary, continuous speech recognisers. Building on existing prototypes that were available to most of the partners, this project includes state-of-the-art HMMs and ANNs and explores aspects such as theory, implementation, improved training and speaker adaptation in hybrid HMM/ANN systems. At the end of the first year of this project, comparison results of different hybrid structures based on a common recogniser and common hardware are available.


The main objective of this project is to learn how artificial neural networks (ANNs) can be used for continuous speech recognition to significantly improve state-of-the-art systems and, using dedicated hardware, to develop fast implementations of the resulting algorithms, ie real-time recognition and fast turnaround of training. More specifically, this project addresses the problem of improving state-of-the-art, hidden markov model (HMM)-based, large vocabulary, speaker dependent and independent, continuous speech recognition systems by means of hybrid HMM/ANN structures.

In this framework, different ANN architectures will be compared and speaker adapation methods will be developed. This project contains two parts with very strong inter-dependencies:

Approach and Methods

The consortium brings together partners with existing skills and baseline systems in the area: LHS and ICSI (Intl. Computer Science Institute, Berkeley, CA, subcontractor) in hybrid hidden Markov model (HMM)/multilayer perceptron (MLP) structures and CUED in recurrent neural network (RNN) structures, both of which perform competitively with state-of-the-art HMM technology; INESC in artificial neural networks (ANNs) and speaker adaptation, and ICSI in their development of the Ring Array Processor (RAP) which provides over 500 Mflops and which is now being used for computation by each partner.

The main research themes include further development and improvement of the baseline HMM/MLP hybrid, and development of an HMM/RNN hybrid; definition of common recognition software to be used as a basis for comparison and assessments of research results; comparison of both MLP and RNN hybrid systems; development of better acoustic features with enhanced speaker and communication channel robustness; incorporation of improvements in hybrids analogous to those used in state-of-the-art HMM recognisers; development of better training procedures; investigation of fast speaker adaptation in hybrids; demonstration of real-time recognisers and their evaluation against state-of-the-art HMMs and international reference databases such as DARPA Resource Management (1000 words, speaker independent, continuous speech) and Wall Street Journal (5000 and 20000 words, speaker independent, continuous speech).

The training of hybrid structures is highly computer intensive. The inclusion of ICSI as a subcontractor gives the consortium access to the very high performance hardware (RAP and a VLSI processor called SPERT) and software tools which ICSI has developed and will further adapt as the project progresses. These hardware and software tools will be used as a common platform of this project.

Progress and Results

At the end of the first year of this project, comparison results of different hybrid structures based on a common recogniser and common hardware are available. These results have shown that the hybrid approach was able to achieve recognition performance comparable to much more sophisticated state-of-the-art HMMs, ie, around 5% error rate on the DARPA Resource Management database (1000 words, speaker independent, continuous speech recognition task).


This project is expected to make a significant technical and scientific contribution to the use and understanding of HMM/ANN hybrids and of HMMs and ANNs separately in speech recognition, pattern recognition, and to the neural computing involved. It will also provide a testbed for a new generation of commercial speech recognition systems exploiting hybrid HMM/ANN technology.

Latest Publications

Information Dissemination Activies

A workshop on this topic with invited external participants will be organised during 1994.


Lernout and Hauspie Speechproducts - B
K. Albert I laan 64
B - 1780 WEMMEL


University of Cambridge - UK
International Computer Science Institute - USA


Dr. H. Bourlard
tel +32/2 460 33 97
fax +32/2 460 01 72
e-mail: bourlard@brussels.lhs.be

LTR synopses home page LTR work area index LTR acronym index LTR number index LTR Projects index
All synopses home page all acronyms index all numbers index

WERNICKE - 6487, August 1994

please address enquiries to the ESPRIT Information Desk

html version of synopsis by Nick Cook