Apertium
businessMailing List: mailto:apertium-stuff@lists.sourceforge.net
Apertium is a free/open-source platform for creating rule-based machine translation systems. We try and focus our efforts on lesser-resourced and marginalised languages, but also work with larger languages. There are currently 37 published language pairs within the project and many more in development.
Google Code-in students: We are interested in your language, whatever it might be. Come and visit us on IRC and talk to one of our mentors about tasks relating to your native language.
Completed Tasks
- *-morph and *-gener modes for apertium-apy
- Abstract the formatting for the simple-html interface.
- add language searching to pairviewer
- apertium stream format output for bison grammars
- apertium-apy gateway
- apertium-apy gateway server pool management
- apertium-apy mode for geriaoueg (biltrans in context)
- apertium-apy translation-per-word mode
- apertium-html-tools localisation in modes
- begiak plugin to get number of articles of a given language's wikipedia
- better wikipedia extractor script
- Bison syntax-tree visualisation
- Categorise words by inflectional paradigm (Hindi) [24]
- Categorise words by inflectional paradigm (Russian) [0]
- Categorise words by inflectional paradigm (Russian) [11]
- Categorise words by inflectional paradigm (Russian) [13]
- Categorise words by inflectional paradigm (Russian) [15]
- Categorise words by inflectional paradigm (Russian) [17]
- Categorise words by inflectional paradigm (Russian) [19]
- Categorise words by inflectional paradigm (Russian) [20]
- Categorise words by inflectional paradigm (Russian) [21]
- Categorise words by inflectional paradigm (Russian) [3]
- Categorise words by inflectional paradigm (Russian) [4]
- Categorise words by inflectional paradigm (Russian) [6]
- Categorise words by inflectional paradigm (Russian) [7]
- Categorise words by inflectional paradigm (Russian) [9]
- Categorise words by part-of-speech (Aromanian) [2]
- Categorise words by part-of-speech (Aromanian) [3]
- Categorise words by part-of-speech (Aromanian) [5]
- Categorise words by part-of-speech (Aromanian) [6]
- Categorise words by part-of-speech (Aromanian) [8]
- Categorise words by part-of-speech (Azerbaijani) [0]
- Categorise words by part-of-speech (Belarusian) [10]
- Categorise words by part-of-speech (Belarusian) [14]
- Categorise words by part-of-speech (Belarusian) [4]
- Categorise words by part-of-speech (Belarusian) [5]
- Categorise words by part-of-speech (Belarusian) [6]
- Categorise words by part-of-speech (Belarusian) [8]
- Categorise words by part-of-speech (Bengali) [2]
- Categorise words by part-of-speech (Hindi) [0]
- Categorise words by part-of-speech (Hindi) [11]
- Categorise words by part-of-speech (Hindi) [13]
- Categorise words by part-of-speech (Hindi) [1]
- Categorise words by part-of-speech (Hindi) [3]
- Categorise words by part-of-speech (Hindi) [4]
- Categorise words by part-of-speech (Hindi) [8]
- Categorise words by part-of-speech (Hindi) [9]
- Categorise words by part-of-speech (Malayalam) [0]
- Categorise words by part-of-speech (Malayalam) [13]
- Categorise words by part-of-speech (Malayalam) [4]
- Categorise words by part-of-speech (Malayalam) [5]
- Categorise words by part-of-speech (Slovak) [0]
- Categorise words by part-of-speech (Slovak) [11]
- Categorise words by part-of-speech (Slovak) [5]
- Categorise words by part-of-speech (Slovak) [8]
- Check that the Apertium guide for Windows users still works
- come up with better colours for pairviewer
- Convert Balkan languages wiki page to new style
- Convert Celtic languages wiki page to new style
- Convert Germanic languages wiki page to new style
- Convert Semitic languages wiki page to new style
- Create a Dravidian languages page on the apertium wiki
- Create a program to generate a flex lexer from an XML description
- Create a program to generate a skeleton bison grammar from an XML description
- Create a Uralic languages page on the apertium wiki
- Depth first traversal for intersection in lttoolbox (C++)
- Dictionary conversion in python (1) [0]
- Ensure an Apertium language pair does not mess up (X)HTML formatting (English and Catalan) [0]
- Ensure an Apertium language pair does not mess up (X)HTML formatting (English and Catalan) [1]
- Ensure an Apertium language pair does not mess up (X)HTML formatting (English and Catalan) [2]
- Ensure an Apertium language pair does not mess up (X)HTML formatting (English and Galician) [0]
- Ensure an Apertium language pair does not mess up (X)HTML formatting (English and Galician) [2]
- Ensure an Apertium language pair does not mess up (X)HTML formatting (English and Spanish) [0]
- Ensure an Apertium language pair does not mess up (X)HTML formatting (English and Spanish) [1]
- Ensure an Apertium language pair does not mess up (X)HTML formatting (English and Spanish) [2]
- Ensure an Apertium language pair does not mess up wordprocessor (ODT, RTF) formatting (English and Catalan) [0]
- Ensure an Apertium language pair does not mess up wordprocessor (ODT, RTF) formatting (English and Catalan) [1]
- Examples of minimum files where an Apertium language pair messes up (X)HTML formatting
- Examples of minimum files where an Apertium language pair messes up (X)HTML formatting (Basque and Spanish) [0]
- Examples of minimum files where an Apertium language pair messes up (X)HTML formatting (Catalan and English) [1]
- Examples of minimum files where an Apertium language pair messes up (X)HTML formatting (English and Catalan) [0]
- Examples of minimum files where an Apertium language pair messes up (X)HTML formatting (English and Catalan) [1]
- Examples of minimum files where an Apertium language pair messes up (X)HTML formatting (English and Spanish) [0]
- Examples of minimum files where an Apertium language pair messes up (X)HTML formatting (English and Spanish) [1]
- Examples of minimum files where an Apertium language pair messes up (X)HTML formatting (English and Spanish) [2]
- Examples of minimum files where an Apertium language pair messes up (X)HTML formatting (Galician and English) [2]
- Examples of minimum files where an Apertium language pair messes up (X)HTML formatting (Portuguese and Spanish) [2]
- Examples of minimum files where an Apertium language pair messes up (X)HTML formatting (Spanish and English) [1]
- Examples of minimum files where an Apertium language pair messes up wordprocessor formatting (English and Catalan) [0]
- Examples of minimum files where an Apertium language pair messes up wordprocessor formatting (English and Catalan) [1]
- Extract (scrape) inflections from Wiktionary (Bulgarian) [0]
- Extract (scrape) inflections from Wiktionary (Bulgarian) [2]
- Extract (scrape) inflections from Wiktionary (Bulgarian) [4]
- Extract (scrape) inflections from Wiktionary (Chinese) [0]
- Extract (scrape) inflections from Wiktionary (Faroese) [0]
- Extract (scrape) inflections from Wiktionary (Greek) [1]
- Extract (scrape) inflections from Wiktionary (Greek) [2]
- Extract (scrape) inflections from Wiktionary (Lao) [0]
- Extract (scrape) inflections from Wiktionary (Slovak) [0]
- Extract (scrape) inflections from Wiktionary (Slovak) [1]
- Extract (scrape) inflections from Wiktionary (Slovak) [2]
- Extract (scrape) inflections from Wiktionary (Slovak) [4]
- Extract (scrape) inflections from Wiktionary (Swedish) [0]
- Extract (scrape) inflections from Wiktionary (Swedish) [1]
- Extract (scrape) inflections from Wiktionary (Swedish) [2]
- Extract (scrape) inflections from Wiktionary (Thai) [0]
- Extract (scrape) inflections from Wiktionary (Vietnamese) [1]
- find an error in translation and fix (Russian and Ukrainian)
- find an error in translation and fix (Russian and Ukrainian)
- find an error in translation and fix (Russian and Ukrainian)
- find an error in translation and fix (Russian and Ukrainian)
- find an error in translation and fix (Russian and Ukrainian)
- find an error in translation and fix (Russian and Ukrainian)
- georeference language areas for Karakalpak and Kyrgyz
- georeference language areas for misc Caucasus-area languages
- georeference language areas for Tajik and Turkmen
- georeference language areas for Ukrainian and Belorussian
- Get bison output formatter to output LaTeX qtree format
- Hand-annotate 250 words of text (Bulgarian) [2]
- Hand-annotate 250 words of text (English) [0]
- Hand-annotate 250 words of text (English) [10]
- Hand-annotate 250 words of text (English) [11]
- Hand-annotate 250 words of text (English) [12]
- Hand-annotate 250 words of text (English) [13]
- Hand-annotate 250 words of text (English) [14]
- Hand-annotate 250 words of text (English) [15]
- Hand-annotate 250 words of text (English) [16]
- Hand-annotate 250 words of text (English) [17]
- Hand-annotate 250 words of text (English) [18]
- Hand-annotate 250 words of text (English) [19]
- Hand-annotate 250 words of text (English) [1]
- Hand-annotate 250 words of text (English) [2]
- Hand-annotate 250 words of text (English) [3]
- Hand-annotate 250 words of text (English) [4]
- Hand-annotate 250 words of text (English) [5]
- Hand-annotate 250 words of text (English) [6]
- Hand-annotate 250 words of text (English) [7]
- Hand-annotate 250 words of text (English) [8]
- Hand-annotate 250 words of text (English) [9]
- Hand-annotate 500 words of text (Bulgarian) [6]
- Hand-annotate 500 words of text (English) [0]
- Hand-annotate 500 words of text (English) [10]
- Hand-annotate 500 words of text (English) [11]
- Hand-annotate 500 words of text (English) [12]
- Hand-annotate 500 words of text (English) [13]
- Hand-annotate 500 words of text (English) [14]
- Hand-annotate 500 words of text (English) [15]
- Hand-annotate 500 words of text (English) [16]
- Hand-annotate 500 words of text (English) [17]
- Hand-annotate 500 words of text (English) [18]
- Hand-annotate 500 words of text (English) [19]
- Hand-annotate 500 words of text (English) [1]
- Hand-annotate 500 words of text (English) [20]
- Hand-annotate 500 words of text (English) [21]
- Hand-annotate 500 words of text (English) [22]
- Hand-annotate 500 words of text (English) [23]
- Hand-annotate 500 words of text (English) [25]
- Hand-annotate 500 words of text (English) [26]
- Hand-annotate 500 words of text (English) [27]
- Hand-annotate 500 words of text (English) [28]
- Hand-annotate 500 words of text (English) [29]
- Hand-annotate 500 words of text (English) [2]
- Hand-annotate 500 words of text (English) [30]
- Hand-annotate 500 words of text (English) [31]
- Hand-annotate 500 words of text (English) [32]
- Hand-annotate 500 words of text (English) [33]
- Hand-annotate 500 words of text (English) [34]
- Hand-annotate 500 words of text (English) [35]
- Hand-annotate 500 words of text (English) [36]
- Hand-annotate 500 words of text (English) [37]
- Hand-annotate 500 words of text (English) [39]
- Hand-annotate 500 words of text (English) [3]
- Hand-annotate 500 words of text (English) [40]
- Hand-annotate 500 words of text (English) [41]
- Hand-annotate 500 words of text (English) [42]
- Hand-annotate 500 words of text (English) [43]
- Hand-annotate 500 words of text (English) [44]
- Hand-annotate 500 words of text (English) [45]
- Hand-annotate 500 words of text (English) [46]
- Hand-annotate 500 words of text (English) [47]
- Hand-annotate 500 words of text (English) [48]
- Hand-annotate 500 words of text (English) [49]
- Hand-annotate 500 words of text (English) [4]
- Hand-annotate 500 words of text (English) [5]
- Hand-annotate 500 words of text (English) [6]
- Hand-annotate 500 words of text (English) [7]
- Hand-annotate 500 words of text (English) [8]
- Hand-annotate 500 words of text (English) [9]
- Improve the quality of a language pair by adding 50 words to its vocabulary (English and Spanish) [1]
- Improve the quality of a language pair by adding 50 words to its vocabulary (Spanish and Catalan) [0]
- improved concordancer search interface
- Indic languages page
- Interface behaviour for language detection
- Intersection of two transducers in lttoolbox
- Lemmatise words by frequency (Belarusian) [1]
- Lemmatise words by frequency (Belarusian) [2]
- Lemmatise words by frequency (Hindi) [11]
- Lemmatise words by frequency (Hindi) [13]
- Lemmatise words by frequency (Hindi) [14]
- Lemmatise words by frequency (Hindi) [3]
- Lemmatise words by frequency (Hindi) [9]
- Lemmatise words by frequency (Malayalam) [13]
- Lemmatise words by frequency (Malayalam) [8]
- Lemmatise words by frequency (Slovak) [0]
- Lemmatise words by frequency (Slovak) [12]
- Lemmatise words by frequency (Slovak) [14]
- Localised 'available languages' in apertium-apy
- make concordancer work with output of analyser
- Make intersection of lttoolbox transducers work with several sections per binary (C++)
- Make Languages of the Volga-Kama region wiki page have new format
- make phenny/begiak's iso639 plugin use db generated from ethnologue data
- make RFERL scraper documentation read like a HOWTO guide
- make scraper plugin for azadliq.org
- Make the bison parse formatter output bracketed parses instead of parse trees
- Make WikiBhasha take content from any language's wikipedia
- Manually create a transfer lexicon from a word list in .dix format (Czech and Slovak) [9]
- Manually create a transfer lexicon from a word list in .dix format (Finnish and Estonian) [0]
- Manually create a transfer lexicon from a word list in .dix format (Finnish and Estonian) [1]
- Manually create a transfer lexicon from a word list in .dix format (Finnish and Estonian) [2]
- Manually create a transfer lexicon from a word list in .dix format (Macedonian and Bulgarian) [11]
- Manually create a transfer lexicon from a word list in .dix format (Macedonian and Bulgarian) [12]
- Manually create a transfer lexicon from a word list in .dix format (Macedonian and Bulgarian) [13]
- Manually create a transfer lexicon from a word list in .dix format (Macedonian and Bulgarian) [16]
- Manually create a transfer lexicon from a word list in .dix format (Macedonian and Bulgarian) [1]
- Manually create a transfer lexicon from a word list in .dix format (Macedonian and Bulgarian) [4]
- Manually create a transfer lexicon from a word list in .dix format (Romanian and Aromanian) [3]
- merge mutantmonkey and jonorthwash branches of phenny (git)
- monodix support for stem-counting script
- phenny queue plugin
- phenny/begiak apertium_wiki plugin not chopping right
- phenny/begiak ethnologue plugin
- phenny/begiak mediawiki plugin(s) support for subsections
- phenny/begiak url module localisation improvements
- port apertium-apy to bottle or equivalent
- Proofread an existing dictionary (Hindi and Urdu) [14]
- Proofread an existing dictionary (Hindi and Urdu) [18]
- Proofread an existing dictionary (Hindi and Urdu) [19]
- Proofread an existing dictionary (Hindi and Urdu) [1]
- Proofread an existing dictionary (Hindi and Urdu) [2]
- Proofread an existing dictionary (Hindi and Urdu) [5]
- Proofread an existing dictionary (Hindi and Urdu) [6]
- Proofread an existing dictionary (Hindi and Urdu) [8]
- Proofread an existing dictionary (Hindi and Urdu) [9]
- regex searching in concordancer
- Remove graphViz from bison source file and write script to generate .dot files from bracketted output
- Run apertium-dixtools paradigm merger on a dictionary (Russian) [0]
- Run apertium-dixtools paradigm merger on a dictionary (Ukrainian) [0]
- scrape a freely available dictionary using tesseract (Crimean Tatar and Russian) [0]
- scrape a freely available dictionary using tesseract (Crimean Tatar and Ukrainian) [0]
- scrape a freely available dictionary using tesseract (Ukrainian and Russian) [0]
- scraper for all article urls from kumukia.ru/cat-qumuq.html
- scraper for all text urls from kumukia.ru/adabiat
- scraper for all wiktionary pages in a category
- Scraper for freely available forum content (asyl-bilim and http://asyl-bilim.kz/forum/) [0]
- Scraper for freely available forum content (haos.ucoz.kz and http://haos.ucoz.kz/forum) [0]
- Scraper for freely available forum content (massagan and http://massagan.com/forum.php) [0]
- scraper of wiktionary translations between language x and y
- simple-html morphological analysis/generation code
- simple-html morphological analysis/generation interface
- simple-html spell-checker interface
- SSL in apertium-apy
- Start a language pair involving Interlingua
- Support non-ASCII characters in flex lexers
- test and document init scripts for apertium-apy (upstart (Ubuntu))
- Wiktionary language-page count calculator
- Write a bison grammar to output input tokens in the same order as they are input
- write a browser interface for the concordancer
- Write a dictionary-based tokeniser for Asian languages (Chinese) [0]
- Write a dictionary-based tokeniser for Asian languages (Chinese) [1]
- Write a dictionary-based tokeniser for Asian languages (Chinese) [2]
- Write a dictionary-based tokeniser for Asian languages (Chinese) [3]
- Write a simple bison grammar (Czech)
- Write an sentence aligner for the UDHR