The Analysis and Synthesis of Speaker Characteristics

VOX - 6298

Work Area: Speech and Natural Language

Keywords speaker types, speaker states, speaker styles, vocal profile analysis, electropalatography, voice source analysis, speech synthesis

Start Date: 1 October 92 / Duration: 36 months / Status: running

Abstract The VOX Working Group is investigating speech databases with different types of speakers, different affective conditions of emotion and attitude, and different casual versus careful styles of speaking: each considered with reference to acoustic, perceptual and physiological representation. Speech synthesis can be used to empirically test such characterisations.


The VOX Working Group is an experienced multidisciplinary team of specialists in speech science, speech technology and experimental psychology, each already responsible for internationally recognised innovations of theory and practice in the different areas involved. The overall long-term objective of the Group is to describe inter-speaker differences and intra-speaker differences of speaker type, speaker state and speaker style. Identification of a conceptual framework for the global space of inter-speaker differences will allow the Group to address its key scientific question: what are the dimensions and limits of speaker-space occupiable by the individual speaker, and how can intraspeaker variations of speaker type, speaker state and speaker style be modelled to suit implementation in speech synthesis?


The activities of the Working Group centre on investigations of the speech of different types of speakers, with different affective conditions of emotion and attitude, and different casual versus careful styles of speech. The Group is considering the three categories of speaker type, speaker state and speaker style. Each of these is considered at three levels: acoustic, perceptual and physiological. Each of these domains also allows consideration of both laryngeal and supralaryngeal contributions to a speaker's voice. Speech synthesis affords an empirical means of testing the conclusions drawn from such investigations. Discussions at group technical meetings and workshops draw results from these three domains together, as a preliminary to the development of an integrated descriptive model of speaker characterisation.

The mode of working of the Group is to hold a Consortium-wide Workshop every six months, where the researchers from each of the sites learns a new analytic technique under the instruction of the host partner. Once a year, a Group plenary meeting discusses progress towards the goal of an integrated descriptive system.

Progress and Results

The first Group Workshop was held in February 1993, in the Centre for Speech Technology Research at the University of Edinburgh, where a course was taught on Vocal Profile Analysis, and on electropalatographic methods of analysis. This was attended by over 30 researchers from Consortium sites and external researchers invited by the partners. The next Workshop will be held in Geneva, in October 1993, on speaker state analysis.

Members of the VOX Working Group also attended workshops of other Basic Rsearch Projects and Working Groups, for liaison and mutual information. These included the SPEECHMAPS project workshop in April 1993 in Paris, and the ACCOR Workshop on articulography in Munich in April 1993. Researchers from LIMSI have visited the Stockholm and Sheffield partners, and the Dublin partner has also visited Stockholm for collaborative research. Papers have been given at relevant conferences (the British Institute of Acoustics; the International Association of Forensic Phonetics; International Conference on Interdisciplinary Perspectives in Speech and Language Pathology; Symposium on Natural Language Processing and Speech Technology (Bangkok)).


Success in providing a unified representation of speaker characteristics would result in industrial usability in terms of production of speech synthesis products that are more naturalistic in quality, and better able to project application-appropriate synthetic speaker-attributes of identity, personality and affect. Speech recognition system capabilities would also be improved through a better understanding of the basis for speaker independence and speaker adaptation. The provision of an adequate description of speaker characterisation would thus bring pervasive benefits to commercially oriented work in speech technology.

Latest Publications

Information Dissemination Activies

Information about the Working Group is disseminated at conferences, and through journal publication. For more direct information, contact Professor John Laver at the address or e-m address given earlier in the contact point details.


University of Edinburgh - UK
80 South Bridge


Université de Genève - CH
IKP-Universität Bonn - D
CNRS-Institut de Phonetique - F
Trinity College Dublin - IRL
University of Cambridge - UK
University of Reading - UK
University of Sheffield - UK


Prof. J. Laver
tel +44/31 650 2785
fax +44/31 226 2730

VOX - 6298, August 1994

