The main objective of this project is to develop a cross-language search system to query multilingual collections of documents, the scenario being collections in the field of science and technology on the Internet.
To achieve this, representation models of documents and ranking techniques will be tested and implemented. In addition, techniques for normalizing the representation of the query in all languages will be tested and implemented. Translations will be done using dictionary-based techniques. Various techniques will be tested and implemented to solve problems that arise in this type of translation process such as the ambiguity and coverage of the dictionary. It also will be researched and implemented a correct way to merge the different rankings from different collections.
In a cross-language information retrieval system, it will be possible to query vast, multilingual collections of documents by writing the input query in one language. In any case, both the structure of the input query and the presentation of results can be varied. The input query, for example, can range from a set of key words to a query written in natural language.
Moreover, the presentation format of the result is related to the kind of algorithm that is used to obtain the results, that is, with the way the results are selected and sorted. What is being proposed in this project is to develop a scientific content search system that will offer queries based on sets of search terms, and ranking algorithms based on statistical models.
Copyright © 2007 Elhuyar Fundazioa | Legal notice | Site Map | Erabiltzaile-kopurua: 856789
Diseinua: Blanco