Go to top of page

Resources and Tools

Eroski Consumer Corpusa

The Eroski Consumer corpus

Through the web interface, you can consult the entire contents of the Consumer magazine using the normal method for corpuses (word form or category). You can also view the results in the same way (showing the context and number of hits for each word entered). Moreover, since it is a multilingual corpus, you can also see how the words are translated into other languages.

This multilingual corpus is a key linguistic resource, not only for language professionals, but for society in general. Consequently, it is available to any interested party over the Internet.

The Consumer Corpus currently on-line includes the magazine issues published between 1998 and 2009: a total of 131 issues and 2,590 articles. The table below shows the total number of sentences and words in each of the four languages of the corpus (you should bear in mind that different language versions were added at different moments, and that some languages are more agglutinative than others, and so have lower word counts):

 LanguageSentences Words

 Basque

 2322502362536
 Spanish 2922743758454
 Catalonian 214584 2760467
 Galician 2086522549878

 

 

 

 

 

 

The documents are aligned by sentences, in order to make it easier to see how words have been translated into different languages. This sentence-based alignment is carried out automatically, and is therefore not perfect. The percentage of correct alignment between Basque and the other three languages is around 82-84%, while between the other three languages themselves this figure is 89-93%.

The Consumer Corpus was created by Elhuyar Hizkuntza Zerbitzuak and Eleka Ingeniaritza Linguistikoa for Eroski Fundazioa.

The seminar “Modern corpus production”

The seminar entitled “Modern corpus production”, organised by Eroski Fundazioa, Euskaltzaindia and Elhuyar Fundazioa, was held at the head offices of Euskaltzaindia in Bilbao, on 21 January 2010.

The Eroski Consumer Corpus was presented during this meeting, although much emphasis was also placed on the importance of corpuses for linguistics, as well as the current situation of Basque and multilingual corpuses.

Programme and conference documents:

  • Opening ceremony. Andoni Sagarna. Euskaltzaindia.
  • Text corpuses and language planning. Xavier Gómez Guinovart. Director of the Language Computing Seminar. University of Vigo.
  • The importance of producing corpuses and the situation of the Basque language. Miriam Urkia. Euskaltzaindia.
  • Presentation of the Eroski Consumer Corpus. Igor Leturia, Elhuyar Fundazioa and Edurne Martinez, Eleka Ingeniaritza Linguistikoa.
Go to top of page
twitter

Services

Go to top of page
kxo! Ikasi euskara / Aprende euskera / Learn Basque / Apprenez le basque
0€Buy
Elhuyar dictionary hiztegia. Eusk/ing-eng/basq
Elhuyar dictionary hiztegia. Eusk/ing-eng/basq
22,80€Buy
Go to top of page Go to top of page

Copyright © 2007 Elhuyar Fundazioa | Legal notice | Site Map | Erabiltzaile-kopurua: 856789

webmaster@elhuyar.com

Diseinua: Blanco

Go to top of page