Bilingual Automatic Parallel Indexing and Classification
Most of today's published scientific and technical articles are written in English. Therefore, the number of English documents to be collected and maintained by information brokers/providers such as bibliographic database producers, libraries and publishers increases rapidly. One method to facilitate access to the information with reliable recall and precision is to use smart indexing.
The aim of the BINDEX project is to integrate the prototype of the AUTINDEX system which automatically indexes and classifies bilingual documents into the production process of the two users participating in the project. It is envisaged to produce an applicable mature software utility which can be used for bilingual (Englishand German) indexing and classification. The AUTINDEX approach is based on a controlled vocabulary and advanced natural language processing technologies.
The controlled vocabulary is provided by a classical thesaurus together with a specialised bilingual dictionary which maps different descriptor types of one language into those of the other. The linguistic processing together with statistical modules provide all the information necessary to assign the thesaurus concepts to words including multiword units.
Classification of documents is also based on the output of the linguistic processing and the classification schemes provided by the user. The Bindex system will be implemented as a web-service, therefore also appropriate multilingual user interfaces and APIs to integrate the system into the production cycle of the potential users are evaluated and further developed.
September 2000 - May 2002
- Fachinformationszentrum Technik e.V. , Frankfurt, Germany
- IEE/INSPEC, Stevenage, UK
IEE/INSPEC, Stevenage, UK
Bilingual Indexing for Information Retrieval with AUTINDEX
Das Indexierungssystem AUTINDEX
BINDEX Final Report