National Evolutionary Synthesis Center (NESCent)
businessWeb Page: http://phyloinformatics.net/Phyloinformatics_Summer_of_Code_2011
Mailing List: mailto:phylosoc@nescent.org
NESCent facilitates synthetic research on grand challenge questions in evolutionary biology and also works to address critical needs in software infrastructure and education through promoting open, collaborative development of interoperable and standards-supporting open-source software. The Center is located in Durham, North Carolina, is jointly operated by Duke University, the University of North Carolina at Chapel Hill, and North Carolina State University, and receives its core funding from the National Science Foundation (NSF). As part of our cyberinfrastructure program, NESCent has run five collaborative software source code and vocabulary development sprints aimed at improving interoperability in phyloinformatics, engaging developers of scientific software tools, promoting integration among online data resources, and sustaining the development of shared vocabularies. These events, and our past Summer of Code participation, continue to have significant and lasting impacts on the landscape of collaborative software development in our field. The Center is committed to FLOSS and sharing of scientific data (see our policy at http://www.nescent.org/informatics/data_software_policy.php); all software products of the Center are released as open source and established as collaborative projects on sites such as SourceForge, Google Code, and GitHub. Members of the Center's Informatics team are lead developers in several open-source projects, and one of our organization administrators has been active for ten years on the Board of the Open Bioinformatics Foundation (http://www.open-bio.org/), the umbrella organization for the Bio* projects.
Our code repository can be found here: http://code.google.com/p/google-summer-of-code-2011-nescent. More information about the projects can be found as part of our Ideas Page. We have also created a wrap-up page with brief summaries of all of our students' projects.
Projects
- DIM SUM 2: GPU computing for an individual-based simulator of movement and demography DIM SUM is a recently released simulator of phylogeographic histories based on the movement and reproduction of individuals on a continuous landscape. While very general and flexible, the current implementation of DIM SUM can be quite slow as the number of individuals increases or the resolution of the landscape becomes more refined. The goal of this project is to improve the speed of the simulation by using the GPU for the time consuming computations.
- Export ontology-based phenotype descriptions to the Encyclopedia of Life This project involves developing a system that will map the phenotypic data from an OBD database to the EOL transfer schema. This implies determining what phenotypic information can be used and creating human-readable segments of text that can be integrated in a Encyclopedia of Life page.
- Extending APE to handle incomplete distances APE(analysis of phylogenetics and evolution) is an R package which facilitates the study of phylogenies. Among other features, it allows the user to reconstruct phylogenies from complete distance matrices. This project aims to extend APE with algorithms that allow for the inferring oh phylogenies from incomplete distance matrices
- Extending Jalview’s support for handling RNA Jalview is an alignment editor highly used in different web pages (e.g. Pfam, Rfam). It can also be used as a stand alone application. However like most bioinformatic tools it was developed with protein sequences in mind and is not optimally prepared for use on ncRNAs, yet. The main focus of this years project will be to embed the VARNA (Visualization Applet for RNA) secondary structure display into Jalview’s desktop application. Additionally other structure features will be added.
- Google Summer of Code 2011 Project Proposal TreeBASE acts as a archive for phylogenetic analyses. The current submission of data to TreeBASE is via NEXUS files. However, this format results in a clunky user interface and does not allow for automated submission of metadata or additional annotations to be added. This project will take on the task of accepting NeXML files to TreeBASE so that the submission process of metadata to be easily submitted and so that new annotations of the metadata can be displayed in a user-friendly manner.
- Interoperable exchange of gene tree reconciliation maps The goals of this project are: * Standardize the XML encoding of gene tree reconciliation data by extending an existing standard * Implement this encoding in a standard bioinformatics library (BioPerl) * Modify the iPlant GTR database to support import and export of GTR trees using the new encoding * If time permits, modify the iPlant tree visualization tool so that it internally uses the new encoding * If time permits, use iPlant's taxa resolution to populate taxonomic name strings
- Manipulating NGS data for population genetic analysis This stand alone GUI will format large multi-locus datasets with no reference genome. It will allow researchers with genome-enriched next-generation sequencing data to easily view and modify any subset of their data with a single click (which is currently achieved with tedious formatting by hand and not aided by currently available genomic tools). This program will output four commonly used population genetics analysis formats (NEXUS, FASTA, IMa2, migrate) and focus on user-friendliness.
- PhyloGeoRef: A geo referencing library implemented in Java It is always desirable to present complex data in an amiable format to the human brain. Visual presentation of data is always the preferred choice as the human brain is most receptive to such a format. The PhyloGeoRef library is an attempt in the same direction. It enables representation of phylogenetic data in a geospatial format.