Predictably Dependable Computing Systems


PDCS2 - 6362

Work Area: Distributed Systems, Reliability and Dependability

Keywords dependability, safety, security, prediction, fault tolerance, real-time systems, distributed systems


Start Date: 1 August 92 / Duration: 36 months / Status: running

[ participants / contact ]


Abstract PDCS2 aims to build on, and take significantly further, the work of ESPRIT Basic Research Action 3092 (Predictably Dependable Computing Systems), on the problems of making the process of designing and constructing adequately dependable computing systems much more predictable and cost-effective than at present. In particular it will address the problems of producing dependable distributed real-time systems and especially those where the dependability requirements centre on issues of safety and/or security. The planned programme of research concerns a number of carefully selected topics in fault prevention, fault tolerance, fault removal and fault forecasting. The work to be done ranges in nature from theoretical to experimental; in a number of cases it involves the acquisition or implementation, in prototype form, of software tools, and their experimental interconnection.


Aims

The PDCS2 Project aims to build on, and take significantly further, the work of ESPRIT Basic Research Action 3092 (Predictably Dependable Computing Systems), on the problems of making the process of designing and constructing adequately dependable computing systems much more predictable and cost-effective than at present. In particular it will address the problems of producing dependable distributed real-time systems and especially those where the dependability requirements centre on issues of safety and/or security.

Approach and Methods

The problems of predicting and achieving specific levels of dependability involve all aspects of systems and system specification, design and construction. Despite this, the project has of necessity to be extremely selective regarding the problems to concentrate on, and the planned programme of research concerns a small number of carefully selected topics in fault prevention, fault tolerance, fault removal and fault forecasting, as follows:

Fault Prevention: techniques for eliciting and stating dependability requirements, and means of timeliness analysis in order to enable the building of systems with known maximum execution times.

Fault Tolerance: (i) strategies for designing systems whose performance is maximised within given reliability constraints, and design notations for expressing fault tolerance provisions and timing issues, (ii) the principles of design environments for fault-tolerant systems, and (iii) further development of the fragmentation-redundancy-scattering technique.

Fault Removal: investigation of two complementary methods of generating test inputs, one deterministic, the other probabilistic.

Fault Forecasting: (i) reliability and availability modelling, (ii) means of evaluation for ultra-high dependability (iii) analytical techniques and methods of reducing the state space storage requirements of Markov and semi-Markov modelling tools (iv) improved methods of coverage evaluation using both physical and simulated fault injection, and (v) modelling the operational security of a system in its environment.

The set of sub-tasks which make up these four main tasks range in nature from theoretical to experimental. In several cases they involve the acquisition or implementation, in prototype form, of software tools. We aim to investigate the experimental interconnection of some of these tools, using the inter-tool messaging techniques that have been developed in the MARS Design System (MARDS), as a first step towards the ultimate long term objective of a design support environment which is well-populated with tools and ready-made system components, and which fully supports the notion of predictably dependable design of large distributed real-time computing systems.

Progress and Results

With regard to fault prevention , we are developing (i) techniques for eliciting and stating dependability (and in particular security and safety) requirements in a form that is consistent with a subsequent validation procedure, and (ii) timeliness analysis in order to enable the building of systems with known maximum execution times, including implementation of suitable hardware and operating system bases, and study of time-critical applications running on these bases.

In the area of fault tolerance, work includes (i) the development of strategies for designing systems whose performance is maximised within given reliability constraints, design notations for expressing fault tolerance provisions and timing issues, and the formal description of fault-tolerant designs, (ii) the investigation of the principles of design environments for fault-tolerant systems, and (iii) further development of the fragmentation-redundancy-scattering technique for tolerating both accidental and intentional faults in two non-exclusive directions: generalisation via an object-oriented model, and application to high performance networks.

Research related to fault removal is investigating two complementary methods of generating functional test inputs, ie deterministic (based on formal specifications) and probabilistic, with both methods being investigated.

The principal objectives of our work on fault forecasting have been (i) to extend further our work on reliability and availability modelling, (ii) to develop analytical techniques and methods of reducing the state space storage requirements of Markov and semi-Markov modelling tools aimed at extending the range of complexity of systems whose dependability can be accurately evaluated, (iii) to develop improved methods of coverage evaluation using both physical and simulated fault injection (at circuit and system level), and (iv) to develop further our approach to modelling the operational security of a system in its environment, and to conduct intrusion experiments aimed at providing relevant data for such modelling exercises.

Potential

The work on dependability requirements elicitation could lead to impovements in system specification techniques; that on fault tolerance is aimed at faciliating tradeoffs between system dependability and performance, and achieving combined reliability and security. The work on reliability and availability modelling could already form the basis for industrial exploitation and is at a point where future development would benefit greatly from the provision of data from industry, whereas that on security modelling is very exploratory in nature, and is aimed at establishing whether such modelling can be made practicable. The work on timeliness analysis could be of use as a further data point for the assessment of the potentials and limitations of predictable hardware and operating system behaviour, whilst that on fault injection has promise of providing improvements to the design and evaluation of a systems' provisions for fault tolerance.

Latest Publications

Information Dissemination Activies

The first PDCS2 Open Workshop will take place in Toulouse, September 1993. Over 100 people are expected to attend and external speakers from both industry and non-European Universities will address the attendees.

For further information about the project see the PDCS2 WWW home page <URL:http://www.research.ec.org/pdcs/> or contact:

Barry Hodgson (j.b.hodgson@newcastle.ac.uk), PDCS2 Administrative Coordinator, Department of Computing Science, University of Newcastle, Newcastle upon Tyne, NE1 7RU, UK.


Coordinator

University of Newcastle - UK
Department of Computing Science
UK - Newcastle upon Tyne NE1 7RU

Partners

Technische Universität Wien - A
LAAS-CNRS - F
CNR Pisa - I
Chalmers Tekniska Hogskola - S
University of York - UK
City University - UK

CONTACT POINT

Prof. B. Randell
tel +44/91 222 7923
fax +44/91 222 8232
e-mail: brian.randell@newcastle.ac.uk


LTR synopses home page LTR work area index LTR acronym index LTR number index LTR Projects index
All synopses home page all acronyms index all numbers index

PDCS2 - 6362, August 1994


please address enquiries to the ESPRIT Information Desk

html version of synopsis by Nick Cook