Evaluate efficacy of decompounding algorithm in Dutch--Afrikaans MT
completed by: AureiAnimus
mentors: Francis Tyers
The aim of this task is to take two files of 1,000 lines in the following format:
; ; ^1/1<num>$ ^basismetaal/basis<n><sg><cmp>+metaal<n><sg>$ 1 basismetaal
; ; ^1/1<num>$ ^basisbestuur/basis<n><sg><cmp>+bestuur<n><sg>$ 1 basisbeheer
; ; ^1/1<num>$ ^basinstrumente/bas<n><sg><cmp>+instrument<n><pl>$ 1 basinstrumenten
; ; ^1/1<num>$ ^Bariumverbindings/Barium<n><sg><cmp>+verbinding<n><pl>$ 1 Bariumverbindingen
; ; ^1/1<num>$ ^Bariumsulfaat/Barium<n><sg><cmp>+sulfaat<n><sg>$ 1 Bariumsulfaat
; ; ^1/1<num>$ ^Bariumpoeier/Barium<n><sg><cmp>+poeier<n><sg>$ 1 Bariumpoeder
; ; ^1/1<num>$ ^bariumnitraat/barium<n><sg><cmp>+nitraat<n><sg>$ 1 bariumnitraat
; ; ^1/1<num>$ ^amateurpogings/amateur<n><sg><cmp>+poging<n><pl>$ 1 amateurpogingen
You first need to check the compound word segmentation/analysis. Then check the translation.
Place a 'GA' before the first ';' if the analysis/segmentation is good
Place a 'BA' before the first ';' if the analysis/segmentation is bad.
Place a 'GT' after the first ';' if the translation is good.
Place a 'BT1' after the first ';' if the translation is bad because the constituent words are bad.
Place a 'BT2' after the first ';' if the translation is bad because the words are good but an epenthetic is missing/wrong.
Contact a mentor on IRC #apertium irc.freenode.net to get the necessary files.