INTREPID - 5203
Keywords document handling, document modelling, automatic document classification
Start Date: 05-NOV-90 / Duration: 30 months
[ contact / participants ]
Objectives and Approach
The objective of INTREPID is to develop new techniques for recognising and processing documents, demonstrate them in a development environment, and integrate them into an advanced application for the automatic classification of office documents. The INTREPID project is linked to ROCKI (project 5376). The planned recognition system must cope with a mixture of texts, line graphics, headings and grey-scale images, and with a variety of character sizes, styles and print qualities. Advanced distributed computer hardware will be used so that the increased requirements of new recognition algorithms and strategies can be satisfied.
INTREPID is aiming to:
- Develop new advanced preprocessing and character classification strategies and their implementation, together with existing approaches, in order to process poor-quality documents more successfully.
- Improve reading results by advanced post-processing functions incorporating document layout analysis and linguistic-based approaches. The results of the ROCKI project on decomposing documents into different regions of interest will be taken into account.
- Employ algorithmic procedure and recognition strategies, which can be particularly effective when supported by an appropriate hardware/software architecture. In order to show this, suitable algorithms will be chosen, implemented, tested and modified on a distributed parallel hardware architecture.
- Demonstrate the results in suitable development environments (PC, workstation or dedicated hardware) and in an application specifically developed for the automatic classification of office documents.
The main workpackages can be grouped into four categories:
From Preprocessing to Postprocessing
- working out strategies, procedures and algorithms in the field of preprocessing andclassification, suitable for supporting the recognition of poor-quality office documents
- developing structural algorithms for text recognition and line graphic analysis
- analysing the format and layout of office documents.
Linguistic Contextual Postprocessing
- investigating basic algorithms for lexical, grammatical and semantic analysers
- their application to a number of European languages (English, Italian, Spanish).
Hardware and Software Architecture Definition and Prototype Implementation
- defining a distributed parallel computer architecture, based on state-of-the-art technology, best suited for the recognition procedures on an appropriate prototype hardware platform.
- automatic classification of office documents.
The project will offer improved market possibilities, especially for the industrial partners of the consortium. The exploitation of the results will be most significant in the following ways:
- The consolidation and broadening of the market position in office document readers, especially form readers, by taking advantage of consolidated know-how in optical character recognition, enhanced by a close cooperation among different European companies already involved in this kind of business. This will also help to successfully face the non-European competition and to explore new promising developments in the field of OCR.
- The offer to multimedia applications dealing with paper documents (including any type of information like texts, line graphic and grey scale images) of the possibility of improving the effectiveness of office work. This will be achieved through the development of advanced recognition strategies (using contextual linguistic postprocessing) and of an automatic document classification system. This will also help to improve the market of multimedia workstations.
- The development of basic contextual linguistic postprocessing tools, verified on some specific language dependent implementations, to offer significant improvements on future document recognition systems.
- The definition of the basic design and the verification of a prototype implementation of an optimised hardware/software system, specifically intended to solve the complex problems thrown up by the advanced recognition strategies. This system will be exploited through the integration into some partners' existing or newly developed platforms, but possibly also as an independent add-on system, to be sold in third-party markets.
Mr Heinz Nedderhoff
D - 7750 KONSTANZ
tel: + 49/ 7531-862692
fax: + 49/ 7531-862741
AEG AG - D - C
CTA SA - E - P
ING. C. OLIVETTI & C. SPA - I - P
EWH KOBLENZ - D - A
PACER SYSTEMS LTD - UK - A
UNIVERSITA DI NAPOLI - I - A
NOTTINGHAM POLYTECHNIC - UK - A
UNIVERSITA DI BARI - I - A
INTREPID - 5203, December 1993
please address enquiries to the ESPRIT Information Desk
html version of synopsis by Nick Cook