[Mahout] Distributed Latent Dirichlet Allocation
by David Hall for The Apache Software Foundation
Latent Dirichlet Allocation (Blei et al, 2003) is a powerful learning algorithm for automatically and jointly clustering words into "opics and documents into mixtures of topics, and it has been successfully applied to model change in scientific fields over time (Griffiths and Steyver, 2004; Hall, et al. 2008). In this project, I propose to implement distributed LDA using MapReduce, and to investigate extensions of LDA and possibly more efficient algorithms for distributed inference.