Monday, November 05, 2007

Monday Morning Algorithm Part 1: Fast Low Rank Approximation using Random Projections

I am not sure I will be able to do this often, but it is always interesting to start the week with an interesting algorithm. So I decided to implement one every beginning of the week. Generally, these algorithms are listed in papers and sometimes when one is lucky they are implemented in Matlab, Octave or equivalent. The beauty of these is certainly underscored when they are readily implemented and you can wow yourself by using them right on the spot. I will try to make it so that one can cut and paste the rest of the entry and run it directly on their Matlab or Octave implementation.
I will generally try to focus on small algorithms (not more than 10 important lines) relevant to some of the issues tackled and mentioned in this blog namely Compressed Sensing, randomized algorithms, bayesian computations, dimensionality reduction....
This week, the script I wrote in matlab is relevant to finding a low-rank approximation to a matrix with more rows than columns (overdetermined systems are like that). One could readily use the SVD command in Matlab/Octave but it can take a long time for large matrices. This algorithm uses properties from Random matrices to do the bidding in less time than the traditional SVD based method. I initially found it in [3], the original reference is [1] and it is used in the context of LSI is [2].

clear
% preparing the problem
% trying to find a low approximation to A, an m x n matrix
% where m >= n
m = 1000;
n = 900;
% first let's produce example A
A = rand(m,n);
%
% beginning of the algorithm designed to find alow rank matrix of A
% let us define that rank to be equal to k
k = 50;
% R is an m x l matrix drawn from a N(0,1)
% where l is such that l > c log(n)/ epsilon^2
%
l = 100;
% timing the random algorithm
trand =cputime;
R = randn(m,l);
B = 1/sqrt(l)* R' * A;
[a,s,b]=svd(B);
Ak = A*b(:,1:k)*b(:,1:k)';
trandend = cputime-trand;
% now timing the normal SVD algorithm
tsvd = cputime;
% doing it the normal SVD way
[U,S,V] = svd(A,0);
Aksvd= U(1:m,1:k)*S(1:k,1:k)*V(1:n,1:k)';
tsvdend = cputime -tsvd;
%
%
% relative error between the two computations in percent
rel = norm(Ak-Aksvd)/norm(Aksvd)*100
% gain in time
gain_in_time = tsvdend - trandend
% the random algorithm is faster by
tsvdend/trandend


If you find any mistake, please let me know.

References:
[1] The Random Projection Method, Santosh S. Vempala
[2] Latent Semantic Indexing: A Probabilistic Analysis (1998) Christos H. Papadimitriou, Prabhakar Raghavan, Hisao Tamaki, Santosh Vempala
[3] The Random Projection Method; chosen chapters from DIMACS vol.65 by Santosh S. Vempala, presentation by Edo Liberty October 13, 2006
Photo Credit: ESA/ASI/NASA/Univ. of Rome/JPL/Smithsonian, This image shows the topographic divide between the Martian highlands and lowlands. The mysterious deposits of the Medusae Fossae Formation are found in the lowlands along the divide. The radar sounder on ESA's Mars Express orbiter, MARSIS, has revealed echoes from the lowland plains buried by these mysterious deposits.

No comments:

Printfriendly