Discourse Functions and Representation: an Empirically and Linguistically Motivated Inter-Disciplinary Approach to Natural Language Texts


Work Area: Speech and Natural Language

Keywords discourse, natural language processing, generation

Start Date: 1 October 92 / Duration: 36 months / Status: running

[ participants / contact ]

Abstract DANDELION aims to develop a language-independent theory of discourse on the basis of linguistic analysis and psycholinguistic experimentation. The theory models the interactions between preferences for and constraints on surface forms that may express the same proposition but are pragmatically different. Testing and evaluating the theory will be based on specifications of the system of choices involved in the use of a restricted set of representative discourse phenomena. The specifications will be neutral with respect to generation, interpretation and translation, and will take the type of text and its context into account.


The empirical study of discourse has reached sufficient maturity that it can and should be brought to bear on formal and computational models. Our aim is to gear text-analytical and psycholinguistic research more directly towards this goal, and to incorporate the empirical results in state-of-the-art formal and functional discourse representations.

We will attempt to represent (a selection of) the linguistic resources needed for the generation of text in a declarative and modular way. These representations will be complementary to and compatible with existing representations of linguistic knowledge at the level of the grammar and the lexicon.

Approach and Methods

Discourse functions and their grammatical realisations are investigated and modeled for three European languages: English, German, and Dutch. This guarantees a minimal degree of language independency and makes our research directly relevant to machine translation. We limit our attention to written, monological discourse, excluding interactional phenomena like questions, answers, and acknowledgements, and speech-specific phenomena like accent-placement and intonation. The result will be an executable specification of the properties of discourse that need to be enforced whenever text is used, generated, analysed, and so forth. It will be declarative and nondirectional, and it will not make any claims about optimal or human-like processing.

The research will profit from existing theories of discourse representation originating from the sentence-based orientation. One of the theoretical results of the project will consist in an overview of the augmentation that essentially sentence-based accounts will need in order to become fully-fledged theories of discourse.

Progress and Results

Theoretical and empirical work was carried out in the following areas:

The available grammar resources for the three target languages were inventarised and evaluated with respect to their potential to accept input from a discourse interface.

The interfacing of discourse-thematic notions with lexico-grammatical thematisation options was explored for a systemic grammar of German.


The research carried out in this project is a prerequisite for attaining the long-term goal of developing computational devices that can understand and generate natural language discourse in context. The project bridges interdisciplinary gaps by incorporating empirical results in formal and functional linguistic representations. The development tools being built for the project and the executable specifications of form-function mappings will contribute to the construction of a discourse researcher's workbench for the study of complex interactions of contextual factors and linguistic phenomena.

Latest Publications

Information Dissemination Activies

The initial results of this project and related research by project staff were presented in 20 publications and 36 conference talks and colloquia.

Two workshops were organised at Universität des Saarlandes in November 1992 and June 1993.

An interdisciplinary panel was held at the 43rd Annual Conference of the International Communication Association, Washington, May 1993.

CLS will organise an international workshop in autumn 1994.


Katholieke Universiteit Brabant - NL
Center for Language Studies (CLS)
P.O.Box 90153


GMD-IPSI Darmstadt - D
Universität des Saarlandes - D
Universidad Complutense Madrid - E
University of Edinburgh - UK


Dr. G. Redeker
tel +31/20-548-3082
fax +31/20-644-6436
e-mail: redekerg@vu.let.nl

DANDELION - 6665, August 1994

