Elhuyar Fundazioaren logoa

Elhuyar Fundazioa - Language Services 

Go to top of page

R&D

Information retrieval and extraction (IR-IE)

Dokusare (CLIR) 

Aims and general description

To analyse and work on techniques for relating multilingual documents with each other (including Basque), by developing techniques for the purpose: the semantic classification of documents, the measuring of semantic similarity between documents, the clustering of documents based on entities or terms, etc. A prototype has been developed to evaluate these techniques and to assess the viability of their applications.

These tests have been conducted on the Elhuyar Fundation’s Zientzia.net website

Techniques to relate documents with each other can be put within a single concept: Document similarity.

In order to relate documents with each other, research is being done on techniques to measure the similarity between them. However, similarity is a very broad concept, and as has been pointed out already, this is why there are many different lines of research. In this project, we are planning to go further into semantic similarity between languages, but other kinds of helpful techniques will also be worked on.

By measuring relationships between documents in different languages, we will have the chance of clustering documents with similar content. Taking Basque as the starting point, this is opening up fresh possibilities for us so that we can search content in a range of languages on the Internet, find the content in different languages in the Internet, link, in a semi-automatic way, documents in different languages and which have content that is the same or similar to that in a specific document, build comparable corpora, etc. The modelling of semantic similarity between documents is an interesting problem in cognitive sciences, from the standpoint of both theory and practice.

From the theoretical standpoint, because it deals with a process of basic knowledge, and in practice, because similarity measures are used for search systems, browser systems, text corpus display, applications for filtering and classifying, and in general, many text management tools. For this reason, the development of quality technology with respect to semantic similarity, in other words, the producing of tools that are precise, automatic and scalable, is a basic component of increasing the usefulness of software that manages text.

Spreading:
Go to top of page

Services

Go to top of page
Gizarte-laneko hiztegia
5,70€Buy
Elhuyar dictionary hiztegia. Eusk/ing-eng/basq
Elhuyar dictionary hiztegia. Eusk/ing-eng/basq
22,80€Buy
Go to top of page Go to top of page

Copyright © 2007 Elhuyar Fundazioa | Legal notice | Site Map | Erabiltzaile-kopurua: 856789

webmaster@elhuyar.com

Diseinua: Blanco

Go to top of page