GSoC/GCI Archive
Google Code-in 2010 The Apertium project

Extract Dutch inflection from Wiktionary

completed by: AureiAnimus

mentors: Francis Tyers

Wiktionary has a lot of information on Dutch inflection that could be used by Apertium to make a morphological analyser. This task will involve downloading and processing inflection from Wiktionary into a form that can be usable by Apertium.

It is recommended that you first take a look at each of the categories and get a list of article names, download the articles as HTML and then process the HTML with a script. For example using perl or python.

==Examples from Wiktionary=

* Verbs: http://en.wiktionary.org/wiki/vertrekken

* Adjectives: http://en.wiktionary.org/wiki/groot

* Nouns: http://en.wiktionary.org/wiki/gebouw

==Example output==

Futher information on the example output will be given on request.

===Nouns===

gebouw; gebouw; sg; n.nt
gebouw;  gebouwen; pl; n.nt