Saturday, January 05, 2013

AMP: Assembly Matching Pursuit, Metagenomic units (MGUs) discovery through sequence-based dictionary learning - implementation -

A while back, we saw that for each individual, the microbiome did not seem to change too much over time, but what about at time t, how is the microbiome different among the seven billion individuals currently on earth ? Thanks to the Twitter and Jason Moore, I came across a paper (and attendant code) that sets out to answer that question with a dictionary learning approach. 



Metagenomics, the study of the total genetic material isolated from a biological host, promises to reveal host-microbe or microbe-microbe interactions that may help to personalize medicine or improve agronomic practice. We introduce a method that discovers metagenomic units (MGUs) relevant for phenotype prediction through sequence-based dictionary learning. The method aggregates patient-specific dictionaries and estimates MGU abundances in order to summarize a whole population and yield universally predictive biomarkers. We analyze the impact of Gaussian, Poisson, and Negative Binomial read count models in guiding dictionary construction by examining classification efficiency on a number of synthetic datasets and a real dataset from Ref. 1. Each outperforms standard methods of dictionary composition, such as random projection and orthogonal matching pursuit. Additionally, the predictive MGUs they recover are biologically relevant.
The code can be found here.

I wonder how these greedy algorithms scale for very large databases and how different the output would be if one were to use other dictionary learning techniques (especially the ones tending to structured sparsity). Synthetic data were derived from this human Microbiome dataset.




Join the CompressiveSensing subreddit or the Google+ Community and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

No comments:

Printfriendly