Through the web interface, you can consult the entire contents of the Consumer magazine using the normal method for corpuses (word form or category). You can also view the results in the same way (showing the context and number of hits for each word entered). Moreover, since it is a multilingual corpus, you can also see how the words are translated into other languages.
This multilingual corpus is a key linguistic resource, not only for language professionals, but for society in general. Consequently, it is available to any interested party over the Internet.
The Consumer Corpus currently on-line includes the magazine issues published between 1998 and 2009: a total of 131 issues and 2,590 articles. The table below shows the total number of sentences and words in each of the four languages of the corpus (you should bear in mind that different language versions were added at different moments, and that some languages are more agglutinative than others, and so have lower word counts):
| Language | Sentences | Words |
Basque | 232250 | 2362536 |
| Spanish | 292274 | 3758454 |
| Catalonian | 214584 | 2760467 |
| Galician | 208652 | 2549878 |
The documents are aligned by sentences, in order to make it easier to see how words have been translated into different languages. This sentence-based alignment is carried out automatically, and is therefore not perfect. The percentage of correct alignment between Basque and the other three languages is around 82-84%, while between the other three languages themselves this figure is 89-93%.
The Consumer Corpus was created by Elhuyar Hizkuntza Zerbitzuak and Eleka Ingeniaritza Linguistikoa for Eroski Fundazioa.
The seminar entitled “Modern corpus production”, organised by Eroski Fundazioa, Euskaltzaindia and Elhuyar Fundazioa, was held at the head offices of Euskaltzaindia in Bilbao, on 21 January 2010.
The Eroski Consumer Corpus was presented during this meeting, although much emphasis was also placed on the importance of corpuses for linguistics, as well as the current situation of Basque and multilingual corpuses.
Copyright © 2007 Elhuyar Fundazioa | Legal notice | Site Map | Erabiltzaile-kopurua: 856789
Diseinua: Blanco