Friday, June 05, 2015

Reader's comment: Long reads and the P-river, Hardware for Machine Learning and some implementation


Following up on his recent work, I sent the following to Or Zuk :
Stupid question: You've been doing bacterial community reconstruction with short reads, how does the arrival of long read technology like Oxford Nanopore Minion and PacBio RS II change these results ? [Crossing the P-river] In other words, were there computations you could not do with convex optimization and short reads that now are easily taken care of by any semi-smart greedy solver ? (Please let me know if my question makes any sense).
As usual, Or kindly responded with:
Not stupid at all - I recall that you mentioned once in your blog a 'phase transition' phenomena with reads getting longer for the problem of genome assembly [Note from Igor: probably this entry]. Our problem is a bit different (genome alignment to a known database, not genome assembly, but we align to multiple microbial genomes so we don't know to which genome does each read belong) but also exhibit a similar relation between read length and difficulty (I'm not sure if there is a sharp phase transition or it's more gradual - would be interesting to study the information theoretic properties of this problem).

In the limit of very short reads (think of a read of length '1' being 'A', 'C', 'G', 'T'), all the information you can gain is the total frequency of each nucleotide in your mixture and the problem of reconstructing species identities and frequencies is not statistically identifiable. As reads get longer, you may still have uncertainty in the assignment of each individual read, but the overall reads distribution can enable you to identify uniquely the species frequencies - the problem becomes identifiable, although computationally the problem may be hard (This is the regime we dealt with in our paper - see the analysis of this issue in our spire manuscript: http://arxiv.org/abs/1309.6919). As reads get longer indeed at some point the problem becomes easier, both computationally and statistically.

In the limit of very long reads (so each read covers the entire 16srRNA molecule) - the computational problem we studied becomes trivial - since we assume that the molecules for different species are different, you can easily assign each read to a unique species (at least in principle) - then to estimate the frequency of each species you simply count the corresponding number of reads.
Thanks Or ! 

The Hardware for Machine Learning entry and the new MLHardware tag triggered two answers. The first one from Piero Foscari :
Dear Igor,

thanks for expanding the scope of Nuit Blanche to ML hardware, and in general for keeping all of us informed (and educated)! Actually I would prefer if you had a tag for probabilistic programming because of the wider breadth and generality.

Anyway, you probably know about Vigoda @ Lyric labs and their stuff already, but just in case:
http://en.wikipedia.org/wiki/GP5_chip ...and dimple etc.
Among their papers: Hershey 2012 - Accelerating Inference - towards a full Language, Compiler and Hardware stack

Best regards
Piero

 Thanks Piero !

Still from the same story on Hardware for Machine Learning, here is an unexpected feel good story, Eric Jonas whose work was mentioned sent me the following:

Professor Carron, your blog made me want to work on compressive sensing many years ago. Now I'm a postdoc with Ben Recht at Berkeley. Imagine my surprise when you featured our paper on phase-space imaging on your blog this morning! I just wanted to say thanks for the blog over the years, it's really helped motivate my career, and now I really feel like I've come full-circle!


...Eric Jonas

Postdoc, AMPLab, 
UC Berkeley Electrical Engineering and Computer Science
I am OK with feel good stories ! Finally, Vlad just sent me an email about ShapeFit
Hi Igor,

The code will be publicly available on Paul Hand's website later today:
http://www.caam.rice.edu/~hand

Best,

-Vlad

I look forward to it and I will add it to the entry when it is out !



Image Credit: NASA/JPL-Caltech/Space Science Institute, Full-Res: N00241303.jpg was taken on June 04, 2015 and received on Earth June 05, 2015. The camera was pointing toward SATURN, and the image was taken using the CL1 and CL2 filters.

Join the CompressiveSensing subreddit or the Google+ Community and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

No comments:

Printfriendly