Malicious-and Accidental-Fault Tolerance for Internet Applications
IST Research Project IST-1999-11583
1 January 2000 - 28 February 2003

Check out a summary of the project, or browse through the original project proposal.

MAFTIA involved experts from 5 countries and 6 organisations. The Industrial Advisory Board provided valuable feedback on the work of the project.

Research was organised into six workpackages.

Find out more about the key scientific results and achievements, and the benefits of this research collaboration.

[ Conceptual Model ] [ Architecture ] [ Mechanisms and Protocols ] [ Verification and Assessment ]

Mechanisms and Protocols

Within the context of the MAFTIA conceptual model and architecture, a number of mechanisms and protocols for building intrusion tolerant applications and services were developed by IBM, Lisbon, LAAS and Newcastle. These depend on the notion of distributing trust so as to avoid placing too much trust in any one component of the system. By its very nature, intrusion tolerance requires a “defence in depth” approach, and there can be no single point of failure. However, one of the difficulties that has to be overcome in designing such mechanisms is avoiding the apparent conflict between reliability and secrecy – naïve replication of secrets makes it easier for an attacker to breach confidentiality.

Group communication protocolsaimed at combining classical fault tolerance techniques with design and implementation diversity.

Using a group (or multiple groups) of functionally equivalent replicas is a classical method for achieving fault tolerant systems. It is a method of redundancy that provides some protection against accidental (independent) replica failures as long as the correctly functioning membership of a group does not fall beneath the threshold necessary to ensure resilience to failure. This model is based on the assumption that faults occur independently of one another, affecting all replicas of a group with similar probability. For random and uncorrelated faults within a system, as well as those induced externally, but not maliciously, this assumption seems to be acceptable.

However, faults induced by the malicious acts of an attacker may not always match this assumption. This makes it problematical to use simple replication-based groups in adversarial environments. For example, if all replicas have a common vulnerability that permits an attacker to violate the integrity of the system as a whole, the effective working of the system can be easily compromised. The independence assumption applies here only to the extent that the effort required to break into each machine is the same. With sophisticated “hacking” techniques, this assumption becomes increasingly difficult to justify, especially in view of the daily-reported, large-scale, coordinated system attacks via the Internet.

MAFTIA has explored two different approaches to building intrusion-tolerant group communication protocols. The first approach, developed by IBM, was to use a linear secret sharing scheme based on a generalised adversary structure that can model a more realistic set of fault assumptions. Replicas are classified according to one or more sets of attributes, and it is assumed that the characteristics of corrupting a replica vary according to these attributes. Suitable attributes include physical location, logical domain, system management personnel; type of operating system, protocol implementation, etc. Using an appropriately weighted linear secret sharing scheme, it is possible to construct protocols that can withstand the simultaneous corruption of all replicas in a given attribute class. For example, if the servers varied according to physical location and type of operating system, it would be possible to design a protocol that could tolerate the corruption of all the servers at a given site, and all the servers running a particular operating system.

The second approach, developed by Lisbon, was based on the use of a Trusted Timely Computing Base (TTCB). A TTCB is a trusted system component that can be used to provide timeliness and fail-silence properties in a hostile environment. Thus, the TTCB must be implemented in a way that ensures its trustworthiness, perhaps by using a tamper proof hardware artefact with strict administrative control. Using a TTCB, it is possible to implement a reliable broadcast protocol that can tolerate up to f failures of f+2 replicas.

Using intrusion-tolerant group communication protocols, it is possible to construct Trusted Third Party services (TTPs) such as Certification Authorities, Fair Exchange, Notary, Authentication, and Authorisation.

Transaction error confinementaimed at providing error confinement by encapsulating multiple actions within a transaction that provides atomicity, consistency, isolation and durability.

An intrusion-tolerant transaction service that can be used for error confinement at the application level was developed by Newcastle. The transaction service is implemented using replicated transaction managers and resource managers that communicate using Byzantine agreement protocols, and is built on top of the TTCB developed by Lisbon.

Access controlaimed at regulating access to resources/objects according to the principles of “least privilege” and “need to know”.

LAAS defined an intrusion-tolerant authorisation service that was built on top of the MAFTIA middleware, and in particular the protocols developed by IBM. The authorisation service furnishes a scheme for granting permissions to each participant of a multiparty transaction, while distributing to each party only those permissions that are strictly needed to execute its own task. This scheme is based on two levels of protection:

  • A distributed authorisation server, in charge of granting or denying rights for operations involving one or more remote hosts. If such an operation is authorised, the authorisation server supplies all the necessary capabilities for the elementary operations needed to carry it out.
  • A local reference monitor on each participating host, which is responsible for fine-grain authorisation, and is designed to enforce local access controls and restrict access to local resources by remote objects by intercepting remote method invocations and checking the capabilities that accompany each request. To ensure intrusion tolerance, critical parts of this reference monitor may be implemented using tamper-proof hardware on each participating host (e.g. a Java Card).

The authorisation server is composed of replicated and diverse sites (operated by independent non-colluding system personnel), so that any single fault or intrusion can be tolerated without degrading the service. In particular, a minority of colluding operators and/or security officers is not able to grant privileges to unauthorized users or deny access to authorized users.

The local reference monitor, supported by a safe distributed signature algorithm, controls access to all local application resources. If it is compromised then the effect of the failure is localised; even if an intruder is completely controlling a given host (and thus can execute anything locally), such an intrusion gives no privilege on any other host.

Intrusion detection - aimed at detecting errors that could lead to security failures.

IBM explored two different aspects of intrusion detection systems (IDSs) with MAFTIA. Firstly, an IDS could be used as the error detection mechanism to support intrusion tolerance strategies based on forward and backward error recovery (as opposed to error compensation and fault masking techniques, which don’t require error detection as such). However, such methods depend on a reliable error detection mechanism, and thus, one aspect of IBM’s work on intrusion detection systems has been to focus on ways of improving the quality of intrusion detection by eliminating false positives and false negatives. In particular, they developed a novel data mining technique for analysing historical data and filtering out false positives, and defined a methodology for combining different IDS services for enabling increased coverage and protection against attacks and intrusions.

Secondly, an IDS is itself obviously an attractive target for an attacker, and thus needs to be made intrusion tolerant. Hence, another aspect of IBM’s work on intrusion detection systems has been to investigate the construction of an intrusion tolerant IDS using the underlying MAFTIA mechanisms and protocols developed by the project. Figure 4 illustrates a possible design for large-scale distributed IDS built using a combination of intrusion tolerance strategies, including the use of diverse sensors and event analysers, and secure communication channels.

Figure 4 : Architecture of an intrusion-tolerant IDS

[ Conceptual Model ] [ Architecture ] [ Mechanisms and Protocols ] [ Verification and Assessment ]