Monday, July 07, 2014

Randomized Numerical Linear Algebra for Large Scale Data Analysis ( libSkylark: Sketching based Matrix computations for Machine Learning )

While reading up some RandNLA papers, I stumbled upon the Randomized Numerical Linear Algebra for Large Scale Data Analysis page which has been added to the collection of Highly Technical Reference Pages. Here is what the main page says: 

The Sketching Linear Algebra Kernel is a library for matrix computations suitable for general statistical data analysis and optimization applications.
Many tasks in machine learning and statistics ultimately end up being problems involving matrices: whether you're matching lenders and loans in the microfinance space, or finding the key players in the bitcoin market, or inferring where tweets came from, you'll want to have a toolkit for low-rank matrix approximation, least-squares and robust regression, eigenvector analysis, CUR and non-negative matrix factorizations, and other matrix computations.
Sketching is a way to compress matrices that preserves key matrix properties; it can be used to speed up many matrix computations. Sketching takes a given matrix A and produces a sketch matrixB that has fewer rows and/or columns than A. For a good sketch B, if we solve a problem with inputB, the solution will also be pretty good for input A. For some problems, sketches can also be used to get faster ways to find high-precision solutions to the original problem. In other cases, sketches can be used to summarize the data by identifying the most important rows or columns.
A simple example of sketching is just sampling the rows (and/or columns) of the matrix, where each row (and/or column) is equally likely to sampled. This uniform sampling is quick and easy, but doesn't always yield good sketches; however, there are sophisticated sampling methods that do yield good sketches.
The goal of this project is to build a sketching-based open-source software stack for NLA



and its applications, as shown:


Matrix Completion
Nonlinear RLS,
SVM, PCA
Robust Regression

Other applications

Python: Python-based data analytics scripting layer
PythonBinding: C++ to Python bindings

NLA: Numerical Linear Algebra primitives
(Least squares regression, low-rank approximation, randomized estimators)

Sketch: Sketching kernels
JL, FJL, Gaussian, Sign, Sparse Embedding

Third-Party Libraries:
MPI, Elemental, BLAS, CombBLAS, FFTW, Boost



Here are the recent papers from this RandNLA effort

Random Laplace Feature Maps for Semigroup Kernels on Histograms

Jiyan Yang, Vikas Sindhwani, Quanfu Fan, Haim Avron, Michael Mahoney
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2014

Efficient Dimensionality Reduction for Canonical Correlation Analysis

Haim Avron, Christos Boutsidis, Sivan Toledo, Anastasios Zouzias
SIAM Journal on Scientific Computing, to appear, 2014
Preliminary version appeared in the Proceedings of the 30th International Conference on Machine Learning (ICML), 2013

Quasi-Monte Carlo Feature Maps for Shift-Invariant Kernels

Jiyan Yang*, Vikas Sindhwani*, Haim Avron*, Michael Mahoney
Proceedings of the 31th International Conference on Machine Learning (ICML), 2014
(*) Equal contributors.

Kernel Methods Match Deep Neural Networks on TIMIT

Po-Sen Huang, Haim Avron, Tara Sainath, Vikas Sindhwani, Bhuvana Ramabhadran
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2014
Best Student Paper Award

Optimal CUR Matrix Decompositions

Christos Boutsidis, David Woodruff
ACM Symposium on Theory of Computing (STOC), 2014

Efficient Dimensionality Reduction for Canonical Correlation Analysis

Haim Avron, Christos Boutsidis, Sivan Toledo, Anastasios Zouzias
SIAM Journal on Scientific Computing, to appear, 2014

Faster SVD-truncated Regularized Least-squares

Christos Boutsidis, Malik Magdon-Ismail
Technical Report, 2014

A Note on Sparse Least-squares Regression

C. Boutsidis and M. Magdon-Ismail
Information Processing Letters, to appear, 2014

Approximate Spectral Clustering via Randomized Sketching

A. Gittens, A. Kambadur, C. Boutsidis.
Technical Report, updated Feb 15, 2014

Revisiting Asynchronous Linear Solvers: Provable Convergence Rate Through Randomization

Haim Avron, Alex Druinsky, Anshul Gupta
Proceeding of the 28th IEEE International Parallel & Distributed Processing Symposium (IPDPS) , 2014

Random Projections for Linear Support Vector Machines

S. Paul, C. Boutsidis, M. Magdon-Ismail, P. Drineas
ACM Transactions on Knowledge Discovery from Data, to appear, 2014


Also of interest is libSkylark: Sketching based Matrix computations for Machine Learning on GitHub. From the page:



Introduction
The Sketching based Matrix computations for Machine Learning is a library for matrix computations suitable for general statistical data analysis and optimization applications.
Many tasks in machine learning and statistics ultimately end up being problems involving matrices: whether you're finding the key players in the bitcoin market, or inferring where tweets came from, or figuring out what's in sewage, you'll want to have a toolkit for least-squares and robust regression, eigenvector analysis, non-negative matrix factorization, and other matrix computations.

Sketching is a way to compress matrices that preserves key matrix properties; it can be used to speed up many matrix computations. Sketching takes a given matrix A and produces a sketch matrix B that has fewer rows and/or columns than A. For a good sketch B, if we solve a problem with input B, the solution will also be pretty good for input A. For some problems, sketches can also be used to get faster ways to find high-precision solutions to the original problem. In other cases, sketches can be used to summarize the data by identifying the most important rows or columns.
A simple example of sketching is just sampling the rows (and/or columns) of the matrix, where each row (and/or column) is equally likely to be sampled. This uniform sampling is quick and easy, but doesn't always yield good sketches; however, there are sophisticated sampling methods that do yield good sketches.
The goal of this project is to build a sketching-based open-source software stack for NLA and its applications, as shown:
Matrix CompletionNonlinear RLS,
SVM, PCA
Robust RegressionOther applications
Python: Python-based data analytics scripting layer
PythonBinding: C++ to Python bindings
NLA: Numerical Linear Algebra primitives
(Least squares regression, low-rank approximation, randomized estimators)
Sketch: Sketching kernels
JL, FJL, Gaussian, Sign, Sparse Embedding
Third-Party Libraries:
MPI, Elemental, BLAS, CombBLAS, FFTW, Boost
Image Credit: NASA/JPL-Caltech

Navcam: Left B

2014-07-04 20:21:29 UTC
This image was taken by Navcam: Left B (NAV_LEFT_B) onboard NASA's Mars rover Curiosity on Sol 679 (2014-07-04 20:21:29 UTC).
Full Resolution

Join the CompressiveSensing subreddit or the Google+ Community and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

No comments:

Printfriendly