Nuit Blanche: Summary of Paris Machine Learning Meetup #5: Genomics and SARAH

Saturday, November 16, 2013

Summary of Paris Machine Learning Meetup #5: Genomics and SARAH

On Wednesday night, we had this absolute eye opening meetup. Next meetup (#6) will be December 11th.

First of all a big thank-you to the folks who RSVP-ed they were not coming (even an hour before the meeting) the ability to find great speakers and not have to worry about larger meeting rooms is a savior. Thepeople counter on Meetup shows 84 people, I counted 65 physically in the room. The counter went up to 105 I believe. Again, Franck and I are enormously grateful to the folks who played the game.

DojoEvents hosted us once again. Their value proposition is listed here. If you are a startup and want physical hosting in Paris, check them out. They also have an event based outfit if you want to organize meetups, hackatons and so forth. Guillaume Pellerin used his Parisson suitcase to do the recording of the meeting. There is more to it than meets the eye, he needs to make a presentation at the meetup sometimes. More on the value proposition for the suitcase next week.

Thank you to the both of them for the seamless hosting and streaming of the event (the video of the meeting will come up later)

We had two great speakers from two different horizons

First there was Jean-Philippe Encausse who spoke to us about SARAH (http://encausse.net/s-a-r-a-h) a framework that combines voice and web related services inside the house. His presentation is here. I had gone a day earlier to the UX meetup at NUMA and was struck by some sorts of revelation ( yes, I know I am slow). The UX community is about reducing by any means necessary the learning curve for humans to interact with devices, and nowadays, connected devices. In this context, Machine Learning is really about learning from its user and learn ... fast. One of the known issue with sophisticated systems is whether your grandma will adopt even if she is past 100. Her sight has dwindled yet she wants to do stuff. Let me say it very clearly from experience, the cognitive toll of age in humans is absolutlely related to a lack of interaction as enabled by vision or voice or both. This is why projects such as that of Jean-Philippe are important. The current solutions include hiring UX designers a slow, if expensive, process: can machine learning help in that endeavor by speeding up the learning curve ? As our population will have more grandmas and grandpas passed a certain age, they will want something like SARAH who learns from their own behavior. Thinking about it, the possibility for SARAH seem boundless. Here is another one on the other end of the sprectrum: PRONOTE is the web application currently used in France by professors in Junior High and High Schools designed to communicate with students. The web based application provides a summary of what was taught in class. It also features the homeworks that were given. Some professors even include videos-to-watch on the interweb to get a better sense of what was taught in the classroom. The system also provides parents access to the student's grades. With the right module, one can definitely think in terms of learning analytics or learning applications or both ...The possibilities are truly endless.

To go back to the presentation, Jean-Philippe presented a neural network for a lighting system. Adding new behaviors such as going on vacation seemed to be adhoc which proves yet again that human behavior really needs some thinking. In particular, the system shown by Jean-Philippe seemed to require about a week of training to learn the generic behavior of the user. Jean-Philippe also mentioned, in turn, the need for humans to get accustomed to the system really highlighting a two-way process. I think it would be invaluable if SARAH were to provide some sort of 'anonymized' data on that front. Right now UX is seen by many as black magic: one wonders if machine learning could change that. I also note the community around that SARAH being strong with about 700 members on the G+ community and about 10 new entries per day (my rule of thumb is to multiply this number by 10 to see the actual active community).

Then we had Jean-Philippe Vert ( who got interrupted too many times during the talk with questions. I plead guilty to being one of the offender) who talked about Machine Learning for personalized medicine [2]. His presentation slides are here. Jean-Philippe mentioned that the figure used in Predicting the Future: The Steamrollers is now updated on the genome.gov site

and that the 2007 transition occured mainly because sequencing operations went being sequential to being parallel.

There is more to it on the presentation and in the video (which should come out shortly) but there was an interesting example provided by Jean-Philippe on methods currrently used to predict patterns in genomics. He showed us the result of the NIEHS-NCATS-UNC DREAM Toxicogenetics Challenge, a Netflix-like type of competition without the million dollars attached to it. The description of the challenge can be found here. Jean-Philippe's team came in second with this entry.

Results of the competition are listed here at:

https://www.synapse.org/#!Synapse:syn1761567/wiki/60497

Jean-Philippe shared a fascinating find, if you go through the first 25 first predictions scored on the leaderboard, you'll notice that Kernel based or Linear regression type of prediction algorithms are few and between Random Forest algorithms. Here is what I could count from the explanations given in the wiki files of the entries:

7 Random Forests

3 KRR

3 Linear regression

out of 25 solutions, the rest being unknown as there are no documents of what the methods entailed.

All the recent DREAM challenges are here. Jean-Philippe also mentioned a new set of prediction competition called DREAM 8.5. From here

Best performers in all DREAM 8.5 Challenges will be invited to present at the 2014 DREAM conference (date and location to be determined) with travel expenses covered by the organizers. We are also working to establish publishing partners for each of these challenges. The DREAM 8.5 Challenges are now open for registration, and will begin active problem-solving in late 2013 or early 2014.

This round of challenges include the first Alzheimer’s Disease Big Data DREAM Challenge, the ICGC-TCGA-DREAM Somatic Mutation Calling Challenge, and the Rheumatoid Arthritis Responder Challenge.

Out of the three DREAM 8.5 challenges listed below

Jean-Philippe pointed out a very interesting one, the second, which "...will provide 9 terabytes of raw human sequence data derived from pairs of normal and tumor tissue (from prostate and pancreas)...". In light of some of the problems with making inference on data that has already been "calibrated", the challenge uses raw data to see if we are losing information in the calibration process. wow.

In all, a very interesting set of presentations by Jean-Philippe Encausse and Jean-Philippe Vert. Thank you to the both of them for presenting their invaluable insights.

Archives of the previous meetups are here:

http://nuit-blanche.blogspot.com/p/paris-based-meetups-on-machine-learning.html

There is also a Paris Machine Learning group on LinkedIn at:

http://www.linkedin.com/groups?gid=6400776

References:
[1] SARAH par Jean-Philippe Encausse

SARAH (http://encausse.net/s-a-r-a-h) est un framework open source pour connecter l'internet des objets et ainsi construire l'intelligence de la maison.

Les objets connectés, box domotique, télévisions, montres, ... ont des API très différentes permettant de ne faire qu'une chose. SARAH propose un framework permettant d'interagir avec eux via de la reconnaissance vocale, gestuelle, faciale, ... des QRCode, du NFC, ... et d'apprendre des habitudes des utilisateurs...

[2] Machine Learning for personalized medicine by Jean-Philippe Vert

Tailoring prevention, medical decisions and choices of treatments to each individual patient, in order to maximize the chance of success and limit the risk of negative side effects, is seen as a plausible near future for medicine now that we are able to characterize at full resolution the genome of each person. I will discuss a few machine learning-based approaches to process the large amounts of biological data generated by modern technologies, and how they pave the way to personalized medicine.

Join the CompressiveSensing subreddit or the Google+ Community and post there !