Speech Generation in Multimodal Information Systems

Keywords: speech generation, intonation control, dialogic process, information retrieval

Start Date: 1 May 94 / Status: finished / Duration: 24 months

Objectivies and Approach

The aim of SPEAK is to construct a proof-of-concept prototype of a multimodal information system combining graphical and spoken language primarily in German with a demonstrable extendability to other languages.

The multi-modal user interface constructed in the project is now able to generate automatically, from a deep representation of information and of speaker and hearer goals, meta-communication concerning the information retrieval process. Particular attention is placed on the intonational adequacy of the synthesised speech in the context of the information retrieval dialogue between information seeker and machine.

Progress and Results

Following initial specifications for the functional, dialogue-appropriate control of intonation for German and the specification of a general model for information retrieval as a dialogic process of negotiation between seeker and giver, these components have been instantiated in a complex working prototype. The speech synthesis component has been extensively redesigned in order to provide high level and parametriseable control of the needed intonation contours and their segmental consequences. Reusable subcomponents of the complete system have been clearly isolated for subsequent take-up. These include: intonation control algorithm, marked-up text to speech component, semantics-to-mark-up text (German) component, dialogue model.

Significant improvements in the naturalness of the resulting speech have been achieved, as a consequence of the rich information concerning commutative intentions that is maintained by the interface and special (but generaliseable) extensions that have been made to the speech synthesiser adopted (Multivox).

Information Dissemination Activities and/or Exploitation

Three workshops (Budapest, 1994 and 1996, Darmstadt 1995) have been held to report on the work in the project as well as to offer opportunities for presentations of related work. Project results have been presented in journal articles and workshop.

The reusable components are being evaluated and used in a variety of industrial contexts. It is expected that some of these components will be incorporated in several products, in particular those that are marketed by the company which supplies the speech synthesis equipment used in the project.


Technische Hochschule Darmstadt and GMD/IPSI
Dolivostr. 15
D-64293 Darmstadt, D

EU Partners

TU-Darmstadt, D

Non-EU Partners

TU-Budapest, HU


Prof. Dr. Erich J. Neuhold
Tel: +49 6151 869 800
Fax: +49 6151 869 818

SPEAK - CP93-10393, May 1997

