Elhuyar Fundazioaren logoa

Elhuyar Fundazioa - Language Services 

Go to top of page

R&D

Information retrieval and extraction (IR-IE)

EusBila 

Aims and general description

There are two principal reasons why Internet search engines are unsuitable for Basque: one is that it is only possible to search for a specific form, and not all the forms of a word or lemmata –for example, if we run a search with the word ‘lur’ (earth), we are interested in finding lur, lurra (the earth), lurrean (on the earth), lurrarekin (with the earth), etc.– ; the other is that results that are not in Basque can be returned as well, and this happens, if this same form of the word exists in another language.

For example, technical words like software, anorexia and sulfuroso, proper names like Egipto and Newton, or short words like katu and esne. It is technically possible to build an Internet search engine that will solve these problems, but it would require a huge infrastructure. Instead, the EusBila project makes use of the APIs or interfaces provided by Internet search engines, and the results are considerably improved by employing techniques for processing natural languages.

A morphological generation tool developed by the IXA Group of the UPV/EHU University of the Basque Country is used to display a specific form and all the possible options deriving from its lemma. That way all possible forms are requested from the API using an OR operator. For example, if the user queries the word etxe, the search engine is requested to run a search as follows: etxe OR etxea OR etxeak OR etxeari OR, etc. Naturally, search engines do not accept as many options as one would like and that is why not all the declensions are returned, but sufficient in order to obtain significant results.

In order to obtain results in Basque alone, filtering words are used. The most common words used in Basque are used as a filter, all linked by an AND operator.

In the EusBila Project, variants and standard forms are also dealt with by means of the EDBL lexical database developed by the IXA Group. So, if the word queried is found to have variants, in addition to conducting the search, variants are also proposed to the user, or if what is requested is a variant, the standard form is proposed. This is applied to the variants in the declension suffixes as well. Moreover, in the case of unknown words, it looks to see whether it is possible to reach a standard form through phonological rules, and if that is the case, it is also proposed.

Spreading
Go to top of page

Services

Go to top of page
Elhuyar oinarrizko hiztegia. Euskara/Gaztelania - Castellano/Vasco
12,82€Buy
Euskal hiztegi modernoa
Euskal hiztegi modernoa
36,10€Buy
Go to top of page Go to top of page

Copyright © 2007 Elhuyar Fundazioa | Legal notice | Site Map | Erabiltzaile-kopurua: 856789

webmaster@elhuyar.com

Diseinua: Blanco

Go to top of page