To accelerate the training of kernel machines, we propose to map the input data to a randomized low-dimensional feature space and then apply existing fast linear methods. The features are designed so that the inner products of the transformed data are approximately equal to those in the feature space of a user specified shift invariant kernel. We explore two sets of random features, provide convergence bounds on their ability to approximate various radial basis kernels, and show that in large-scale classification and regression tasks linear machine learning algorithms applied to these features outperform state-of-the-art large-scale kernelThanks Jort. So now the classification mechanism is actually simplified through the use of random projections. This is thought provoking.

machines.

**Igor Carron**

**LightOn || @Google Scholar || @LinkedIn ||@Twitter || @Webpage**

+

*Papers:*

**Approximating Kernels at the speed of Light**

**Imaging with Nature**

**LightOn****Our Newsletter ||Twitter || LinkedIn**

**Nuit Blanche**community@Google+(2362) || @Facebook (383) || @Reddit (1966)

Compressive Sensing @LinkedIn (3903)

Advanced Matrix Factorization @Linkedin (1286)

*Paris Machine Learning*( MLParis.org )@Meetup.com (6251 members) || @archives || @LinkedIn (1837) || @Google+(501) ||

@Facebook (372) || @Twitter (1518 followers)

The Big Picture in Compressive Sensing|| Learning Compressive Sensing ||

Advanced Matrix Factorization Jungle Page ||

*Reference pages*The Big Picture in Compressive Sensing|| Learning Compressive Sensing ||

Advanced Matrix Factorization Jungle Page ||

**Highly Technical Reference Pages - Aggregators**||

These Technologies Do Not Exist || CAI: Cable And Igor's Adventures in Matrix Factorization || Search ||These Technologies Do Not Exist || CAI: Cable And Igor's Adventures in Matrix Factorization || Search ||

**Reproducible Research page**

## Saturday, December 29, 2007

### Compressed Sensing: Random Features for Large-Scale Kernel Machines

## Friday, December 28, 2007

### An opportunity for Opportunity ?

After some additional trajectory analysis, there is now a 1 to 25 chance of the 2007 WD5 meteorite hitting Mars on January 30th. If it hits, it will be a 30 megaton impact in a region north of where Opportunity stands. The last time we had an occurrence like this one was when Comet Shoemaker-Levy hit Jupiter in 1994. Many different telescopes were watching the impact then. But this time, it will be the first time in History that one of our robot sees an impact from the point of the view of the planet being hit. wow. Much can be learned about Mars atmosphere and even geology from such an event that it is mind boggling.

On a different note, this news is really possible because of one man: Gene Shoemaker. Gene is one of the discoverer of the Shoemaker-Levy comet that got people to realize the devastation Earth could sustain if it were to be hit by a similar object. His discovery led to the creation of the Near-Earth Object division at NASA. Gene actually trained to be the first scientist on the Moon and lobbied hard within NASA so that an Apollo astronaut would have the right qualification to pick up the right rocks. He was disqualified for health reasons but his lobbying paid off and Harrison Schmitt (a geologist) got to take that slot on Apollo 17. Gene died in a car accident in 1994. On the one hand, you don't get to walk on the Moon, on the other hand, your legacy extends far beyond the human exploration of space: It makes Space relevant to our history and maybe even our survival. This was not a bad deal after all.

## Thursday, December 27, 2007

### Building a Roaming Dinosaur: Why is Policy Learning Not Used ?

Dubai is going to host a Jurassic Park. It is about time: the current display of Animatronics are very much underwhelming and certainly do not yield the type of magic moment as displayed by Laura Dern's face in Jurassic Park. Yet, all over the world, kids and families line up to pay from $10 to $100 for the privilege of being wowed. The most interesting feature of the Dubai 'Resteless Planet' undertaking will be the claimed ability for the dinosaurs to be roaming. This is no small feat as none of the current animatronics are able to do that. I have a keen interest in this as you probably have noticed from the different entries on muscles, scaling and various autonomous robotics undertaking.

So over the years, I have kept an eye on the current understanding of dinosaur gait. Examples on the web can be found here and here. A sentence seems to be pretty much a good summary:

When the American Museum of Natural History wanted to create a digital walking

Tyrannosaurus rexfor a new dinosaur exhibit, it turned to dinosaur locomotion experts John Hutchinson and Stephen Gatesy for guidance.The pair found the process humbling.

With powerful computers and sophisticated modeling software, animators can take a pile of digital bones and move them in any way they want. This part is easy; but choosing the most likely motion from the myriad of possibilities proved difficult......

The researchers think that one way to narrow down the possibilities of dinosaur movement is to use more rigorous physical constraints in computer models. These constraints fall into two broad categories: kinematic (motion-based) and kinetic (force-based). One simple kinematic constraint, for example, is that the ankle and knee cannot bend backwards.

[ Update: a dinosaur finding showing preserved organic parts shows us that basing our intuition on just bones is not enough. It looks as though dinosaurs may have been much larger. Ona different note, it is one thing to model human behavior (and by extension dinosaur behavior) using Differential equations,but the problem you are trying to solve is 'given a certain behavior, how can it fit the model set forth by the differential equations?'. This is what is called an inverse problem and while a set of differential equations may give you a sentiment that you are modeling everything right, they generally are simplification of the real joint behavior and their interaction with the environment (soil,...). In short, to give a sense of realness, you have to go beyond a description with differential equations alone, for these reasons alone. For this reason, building a real roaming dinosaur need the type of undertaking mentioned above in this entry ]

## Wednesday, December 26, 2007

### Compressed sensing: CS for networked data, Diffusion wavelets and Analog CS measurements.

They also discuss the specific schemes one could implement with a particular attention to data from wireless networks and the analog production of the compressed sensing data through the use of the network. Very nice.

## Tuesday, December 25, 2007

### Compressed Sensing: How could Hardware implementations become disruptive technologies ?

"...Disruptive technologies do not destroy existing market leaders overnight. They do not get adopted by the entire market at the same time. They do not initially seem to be "better" products (in fact, in the early going, they are often distinctly "worse.") They are not initially a viable option for mainstream users. They do not win head-to-head feature tests. Initially, they do not even seem to be a threat.

Disruptive technologies take advantage of a new manufacturing/business process or technology to provide acheaper, more convenient, simpler solution that meets the needs of the low end of the market.Low-end users don't need all the features in the Incumbent's product, so they rapidly adopt the simpler solution. Meanwhile, the Incumbent canvasses its mainstream customers, reassures itself that they want the feature-rich products, and dismisses the Disruptor as a niche player in an undesirable market segment. The Incumbent may dabble with the new technology or process, but only in service of its existing business model.

Then the Disruptor improves its products, adding more features while keeping the convenience and low cost. Now the product appeals to more mainstream users, who adopt it not because it's "better" but because it's simpler and cheaper. Seeing this, the Incumbent continues adding ever more features and functionality to its core product to try to maintain its value proposition for higher end customers. And so on. Eventually, the Incumbent's product overshoots the needs of the mass market, the Disruptor grabs the mainstream customers, and, lo and behold, the technology has been "disrupted."

In a previous entry, I was wondering How could Compressed Sensing be seen as a disruptive technology ?

and used some guidance given by Todd Proebsting when he was giving a talk on innovation in programming languages at LL2.

First of all, Compressed Sensing is really a technique not a technology, so I will try to address the technologies that are beginning to use CS at and see if those can be considered disruptive technologies.

- In the case of MRI, Compressed Sensing should clearly not be labeled as disruptive as it has clearly been adopted by the large MRI companies to increase the throughput of their machines.
- In the Rice single pixel camera case, there is currently a clear advantage in the realm of imaging parts of the light spectrum where the development many sensitive pixels is clearly too expensive. The other interesting aspect of the single pixel imager is the ability to remove the compression scheme which takes most of the battery. But in an age where batteries are seen as the technology to improve, should an imaging hardware be developed just for the purpose of being less power hungry ?
- In the case of hyperspectral imagers, any current hyperspectral imager cost between 70K$ to 100K$. A low cost product while not being very efficient (accurate in certain bands) would clearly make the CS hyperspectral imager as a disruptive technology. One of the major underlying reason as to why CS hyperspectral imager have a reasonable possibility of creating a niche market is the ability to provide a large field of view and dynamic range compared to current technology.

For each of these technologies I will try to address the following in the future:

- Opportunity: what’s the issue?
- Current solutions: what’s done now
- Proposal: New disruptive technology
- Disadvantages why some (many?) will scoff
- Unmet needs benefits to adopters:

### Monday Morning Algorithm 8: An Example of Compressed Measurements Speeding Up Diffusion Maps

In the previous installment of the Monday Morning Algorithm, we featured the diffusion method. I also mentioned that one could use compressed sensing and obtain the same result: i.e the ability to sort several high dimensional images because only one parameter is being varied. When going through this exercise, one can note that of the most expensive operation is the inter-distance measured between all the elements of the set. If every picture is 10 MP, one has to evaluate the difference between elements made of ten million numbers. In Compressed Sensing, one learns that projecting a large image on the right kind of random projections, one can keep the amount of information in the original image. We are going to reuse our last algorithm but we will be projecting the original images of Buzz Lightyear on a smaller space. We will be using a Gaussian ensemble and will be projecting from an image of dimension 57600 down to 100 Compressed Measurements. Let us note that this not an efficient operation unless we have directly access to a compressed measurements or if one does extra operations with the compressed measurements.

It looks like 100 projections are enough to recover the same results as the original analysis. One can note that this process of figuring out what is the number of random projections needed would require the type of analysis evaluating the intrinsic dimensionality of the set as mentioned here by the folks at Rice.

clear

% Stephane Lafon, “Diffusion maps and geometric harmonics”,

% Ph.D. dissertation, Yale University (May

% 2004)

% written by Cable Kurwitz and Igor Carron

% This algorithm shows how random projections

% used in compressed sensing provides the same

% results as the original images.

%

% Number of Random Projections

number_RP =100;

% get all the files in jpg

d = dir('buzz*.jpg')

ntotal = size(d,1)

for i=1:ntotal

d(i).name

end

Img = imread(d(1).name);

s=size(Img,1);

w=size(Img,2);

hd=size(Img,3);

swhd=s*w*hd;

%reshape these images into one vector each

for ii=1:ntotal

Img = imread(d(ii).name);

b1(:,ii) = double(reshape(Img, swhd,1));

end

%

% Compressed sensing part

sa=randn(number_RP,swhd);

for i=1:swhd

sa(:,i)=sa(:,i)./norm(sa(:,i));

end

b=sa*b1;

%b=b1;

%

%

for i=1:ntotal

for j=1:ntotal

D1(i,j)=norm(b(:,i)-b(:,j),2);

end

end

%

n1=ntotal;

D_sum=0;

for l = 1:n1;

D_sum = D_sum + min(nonzeros(D1(l,:)));

end;

epsilon = D_sum/n1/10; %

%

% Step 1

%

kernel_1 = exp(-(D1./epsilon).^1);

%

% Step 2

%

one_V = ones(1,n1);

%

% Step 3

%

p = kernel_1*one_V';

kernel_2 = kernel_1./((p*p').^1);

%

% Step 4

%

v = sqrt(kernel_2*one_V');

%

% Step 5

%

K = kernel_2./(v*v');

%

% Step 6

%

%% Singular Value Decomposition

[U S V] = svd(K);

% Normalization

for i=1:ntotal-1

phi(:,i) = U(:,i)./U(:,1); %

end

eig_funct = phi(:,1:6);

%

% The eigenvalues of \delta are approximated by those of K, and its

% eigenfunctions Ái are approximated by phi(:,i) = U(:; i):=U(:; 1)

%

vv=eig(K);

% Initial order of Buzz images

figure(1)

for i=1:ntotal

subplot(3,4,i)

imshow(d(i).name)

end

X = vv(ntotal-1).*phi(:,2);

% sorting the Buzz images using the first

% eigenfunctions the Laplace Beltrami operator

[Y,I] = sort(X);

% d(I).name

%

figure(2)

for i=1:ntotal

subplot(3,4,i)

imshow(d(I(i)).name)

end

%

figure(3)

plot(vv(ntotal-1).*phi(:,2),'.')

title(' Buzz Lightyear')

xlabel('Index')

ylabel('Lambda_1 phi_1')

%

figure(4)

plot(vv(ntotal-1).*phi(:,2),vv(ntotal-2).*phi(:,3),'.')

title(' Buzz Lightyear')

xlabel('Lambda_1 phi_1')

ylabel('Lambda_2 phi_2')

%

figure(5)

plot(eig(K),'*')

xlabel('Index of eigenvalue')

ylabel('Value of eigenvalue')

title(' Eingenvalues of K')

vv=eig(K);

%

All the Monday Morning Algorithms (and attendant codes/files) are listed here.

## Sunday, December 23, 2007

### Compressed Sensing: Fast Compressive Sampling with Structurally Random Matrices

This paper presents a novel framework of fast and efficient compressive sampling based on the new concept of structurally random matrices. The proposed framework provides four important features. (i) It is universal with a variety of sparse signals. (ii) The number of measurements required for exact reconstruction is nearly optimal. (iii) It has very low complexity and fast computation based on block processing and linear filtering. (iv) It is developed on the provable mathematical model from which we are able to quantify trade-offs among streaming capability, computation/memory requirement and quality of reconstruction. All currently existing methods only have at most three out of these four highly desired features. Simulation results with several interesting structurally random matrices under various practical settings are also presented to verify the validity of the theory as well as to illustrate the promising potential of the proposed framework.

### Compressed Sensing: MCA news and Sublinear Recovery of Sparse Wavelet Signals

Morphological Component Analysis : an Adaptive Thresholding Strategy by Jerome Bobin, Jean-Luc Starck, Jalal Fadili, Yassir Moudden, and David Donoho, the abstract reads:

In a recent paper, a method called MCA (Morphological Component Analysis) has been proposed to separate the texture from the natural part in images. MCA relies on an iterative thresholding algorithm, using a threshold which decreases linearly towards zero along the iterations. This paper shows how the MCA convergence can be drastically improved using the mutual incoherence of the dictionaries associated to the different components. This modified MCA algorithm is then compared to Basis Pursuit, and experiments show that MCA and BP solutions are similar in terms of sparsity, as measured by l1 norm, but MCA is much faster and gives us the possibility to handle large scale data sets.The MOM/matched filtering and thresholding bears some similarity with another paper mentioned in this entry from some of the same co-authors. I haven't looked at it deeper yet. Always on the same topic of Morphological Component Analysis, Jort also mentions the release of the MCA 802 C++ toolbox from Jalal Fadili's webpage.

Toward a dedicated Compressed Sensing based on the signal signature? It looks like Ray Maleh and Anna Gilbert are beginning to do that in Sublinear recovery of sparse wavelet signals. The abstract reads:

There are two main classes of decoding algorithms for “compressed sensing,” those which run time time polynomial in the signal length and those which use sublinear resources. Most of the sublinear algorithms focus on signals which are compressible in either the Euclidean domain or the Fourier domain. Unfortunately, most practical signals are not sparse in either one of these domains. Instead, they are sparse (or nearly so) in the Haar wavelet system. We present a modified sublinear recovery algorithm which utilizes the recursive structure of Reed-Muller codes to recover a wavelet-sparse signal from a small set of pseudo-random measurements. We also discuss an implementation of the algorithm to illustrate proof-of-concept and empirical analysis.

## Saturday, December 22, 2007

### Compressed Sensing, Compressive Sampling: A Customized Search Engine

- see what their other interests are and,
- help Google customized search engine focus on the subject of Compressed Sensing.

## Friday, December 21, 2007

### Compressed Sensing: Intrinsic Dimensionality of a Dataset, StOMP for large scale problems and 2007 WD5 is aiming at Mars

We derive theoretical bounds on the performance of manifold learning algorithms, given access to a small number of random projections of the input dataset. We prove that with the number of projections only logarithmic in the size of the original space, we may reliably learn the structure of the nonlinear manifold, as compared to performing conventional manifold learning on the full dataset.

In the NIPS version of that technical report, the abstract reads:

We propose a novel method for linear dimensionality reduction of manifold modeled

data. First, we show that with a small number M of random projections of sample points in RN belonging to an unknown K-dimensional Euclidean manifold, the intrinsic dimension (ID) of the sample set can be estimated to high accuracy. Second, we rigorously prove that using only this set of random projections, we can estimate the structure of the underlying manifold. In both cases, the number of random projections required is linear in K and logarithmic in N, meaning that K less than M much less than N. To handle practical situations, we develop a greedy algorithm to estimate the smallest size of the projection space required to perform manifold learning. Our method is particularly relevant in distributed sensing systems and leads to significant potential savings in data acquisition, storage and transmission costs.

David Donoho, Yaakov Tsaig, Iddo Drori, and Jean-Luc Starck, Sparse solution of underdetermined linear equations by stagewise orthogonal matching pursuit, the abstract reads:

Finding the sparsest solution to underdetermined systems of linear equations y = x is NP-hard in general. We show here that for systems with ‘typical’/‘random’ , a good approximation to the sparsest solution is obtained by applying a fixed number of standard operations from linear algebra. Our proposal, Stagewise Orthogonal Matching Pursuit (StOMP), successively transforms the signal into a negligible residual. Starting with initial residual r0 = y, at the s-th stage it forms the ‘matched filter’ T rs−1, identifies all coordinates with amplitudes exceeding a specially-chosen threshold, solves a least-squares problem using the selected coordinates, and subtracts the least squares fit, producing a new residual. After a fixed number of stages (e.g. 10), it stops. In contrast to Orthogonal Matching Pursuit (OMP), many coefficients can enter the model at each stage in StOMP while only one enters per stage in OMP; and StOMPtakes a fixed number of stages (e.g. 10), while OMP can take many (e.g. n). StOMP runs much faster than competing proposals for sparse solutions, such as l1 minimization and OMP, and so is attractive for solving large-scale problems.

We use phase diagrams to compare algorithm performance. The problem of recovering a k-sparse vector x0 from (y, ) where is random n × N and y = x0 is represented by a point (n/N, k/n) in this diagram; here the interesting range is k less than n less than N. For n large, StOMP correctly recovers (an approximation to) the sparsest solution of y = x over a region of the sparsity/indeterminacy plane comparable to the region where l1 minimization is successful. In fact, StOMPoutperforms both l1 minimization and OMP for extremely underdetermined problems. We rigorously derive a conditioned Gaussian distribution for the matched filtering coefficients at each stage of the procedure and rigorously establish a large-system limit for the performance variables of StOMP. We precisely calculate large-sample phase transitions; these provide asymptotically precise limits on the number of samples needed for approximate recovery of a sparse vector by StOMP. We give numerical examples showing that StOMPrapidly and reliably finds sparse solutions in compressed sensing, decoding of error-correcting codes, and overcomplete representation.

Always pay attention when Dave Donoho says something. I also found the qualifying exams slide of Deanna Needell interesting, especially slide 116/145 an information I did not know. I am also impatient to see what Thong Do a student of Trac Tran is talking about in this seminar:

In this work, we propose a novel fast compressive sampling algorithm that is based on a new concept of structurally random ensembles. The new proposed algorithm not only enables the system to compressively sample signals very fast but also requires very low complexity with other highly desired features for practical applications such as: integer friendliness, streaming capability. Surprisingly, this novel algorithm is also proposed to solve a closely related problem of high dimensional reduction in theoretical computer sciences.

I mentioned earlier the Titan flyby of today, but guess what kids, this one trumps it: 2007 WD5, an asteroid is going to hit Mars where we currently have five robots ( Spirit and Opportunity, Odyssey, Mars Express and the Mars Reconnaissance Orbiter). Woohoo, where is my popcorn ?.

## Thursday, December 20, 2007

### Compressed Sensing: Compressive sampling vs conventional imaging, the Birth of the Everlasting Dataset

Another aspect of the image reconstruction, which I had not seen before, is also explicitly presented. Once you have retrieved random measurements/projections, Compressed Sensing techniques enable you to reconstruct an image with different bases. If, in 20 years, we have better bases, the same measurements can be used and will reproduce a better image than the one we can currently reconstruct. Case in point, if we had done compressed sensing in the mid-1990's, we would have reconstructed images with edges using wavelets bases. Then curvelets came up , using the same measurements we would have been able to get a clearer imageright when Emmanuel Candes discovered the basis!

The underlying reason being that curvelets provide better approximation properties to edges, hence providing a sparse representation of images (most natural images are made of edges).

The presentation (excerpted from reference [1]) then goes on to show a comparison between recovery using Compressed Sensing with different bases and a normal pixel sampling.

The wedgelets are coming from this implementation. The conclusions are nice summary:

"Compressive sampling techniques can yield accurate reconstructions even when the signal dimension greatly exceeds the number of samples, and even when the samples themselves are contaminated with significant levels of noise. CS can be advantageous in noisy imaging problems if the underlying image is highly compressible or if the SNR is sufficiently large"I can see how there is going to be similar activity in the field of data mining where the underlying data dimension is larger than 2 and for which there is not currently a reasonable basis to decompose datasets on. Images are 10 MB large at most and get reduced to about 1 MB jpeg (this gives an idea of the compression using the DCT). Compressed Sensing with current bases yield about 4 MB of random measurements a reduction of about 60 %. With this in mind, one wonders how many random measurements would be needed to eventually provide a good reconstruction on different types of datasets. Does taking 47 random measurements of Shakespeare's Hamlet provide enough information to reconstruct it ? Maybe not today, if not, when ?

Reference: [1] Jarvis Haupt and Robert Nowak, Compressive sampling vs conventional imaging

## Wednesday, December 19, 2007

### Compressed Sensing Thoughts, Diffusion maps methods for Brain Imaging and for Computer Generated Graphics

I also found items of interest and relevant to dimensionality reduction applied to different domains. The first one is an article on Recovering cerebral white matter structures with Spectral Clustering of Diffusion MRI Data by Demian Wassermann, Maxime Descoteaux and Rachid Deriche.

The abstract reads:

White matter fiber clustering allows to get insight about anatomical structures in order to generate atlases, perform clear visualizations and compute statistics across subjects, all important and current neuroimaging problems. In this work, we present a Diffusion Maps clustering method applied to diffusion MRI in order to cluster and segment complex white matter fiber bundles. It is well-known that Diffusion Tensor Imaging (DTI) is restricted in complex fiber regions with crossings and this is why recent High Angular Resolution Diffusion Imaging (HARDI) such has Q-Ball Imaging (QBI) have been introduced to overcome these limitations. QBI reconstructs the diffusion orientation distribution function (ODF), a spherical function that has its maxima agreeing with the underlying fiber populations. In this paper, we introduce the usage of the Diffusion Maps technique and show how it can be used to directly cluster set of fiber tracts, that could be obtained through a streamline tractography for instance, and how it can also help in segmenting fields of ODF images, obtained through a linear and regularized ODF estimation algorithm based on a spherical harmonics representation of the Q-Ball data. We first show the advantage of using Diffusion Maps clustering over classical methods such as N-Cuts and Laplacian Eigenmaps in both cases. In particular, our Diffusion Maps requires a smaller number of hypothesis from the input data, reduces the number of artifacts in fiber tract clustering and ODF image segmentation and automatically exhibits the number of clusters in both cases by using an adaptive scale-space parameter. We also show that our ODF Diffusion Maps clustering can reproduce published results using the diffusion tensor (DT) clustering with N-Cuts on simple synthetic images without crossings. On more complex data with crossings, we show that our ODF-based method succeeds to separate fiber bundles and crossing regions whereas the DT-based methods generate artifacts and exhibit wrong number of clusters. Finally, we illustrate the potential of our approach on a real brain dataset where we successfully segment well-known fiber bundles.It is pretty long but worth reading.

This course is an introduction to the computational theory of manifolds. Manifold models arise in various area of mathematics, image processing, data mining or computer science. Surfaces of arbitrary dimension can be used to model non-linear datasets that one encounters in modern data processing. Numerical methods allow to exploit this geometric non-linear prior in order to extract relevant information from the data. These methods include in particular local differential computations (related to the Laplacian operator and its variants) and global distance methods (related to geodesic computations). In this course, you will learn how to perform differential and geodesic computations on images, volumes, surfaces and high dimensional graphs.

The course includes a set of Matlab experiments. These experiments give an overview of various tasks in computer vision, image processing, learning theory and mesh processing. This includes computation of shortest paths, Voronoi segmentations, geodesic Delaunay triangulations, surface flattening, dimensionality reduction and mesh processing.

One can notice that the flattening of meshes has a parallel with methods like the MVU, except that in the MVU case, additional interdistance constraints have to be met.

Then again in CG, people are ok with "it's good enough" :-)

## Tuesday, December 18, 2007

### Monday Morning Algorithm 7: Diffusion Maps for Dimensionality Reduction and Manifold Parametrization.

What do we do here ? we first compute the inter distance between each frame, then plug this into a kernel which eventually provides an idea of how connected each frame is to each other. Using a thermal heat transfer analogy, a frame is a node and the new distance between any one node to another one is inversely proportional the number of nodes between the two nodes. The new metric induced by this new distance enables a new parametrization of the manifold. In our case, the manifold is made of one object rotating around one axis. Parametrization of the underlying manifold should yield a one parameter family since only one dimension is being varied between each frame (one rotation). In order to show this, we use the first eigenfunction of the Laplace Beltrami and sort each frame according to its projection on that eigenfunction. The sorting along that dimension permits one to sort Buzz Lightyear rotating continuously from one end to the other.

clearPlease note the ability to test the same algorithm by reducing first the frames through random projections (one needs to delete some % ). It is pretty obvious that one would need a robust metric in the first place instead of L2, one could foresee the use of a specifically designed metric (Improving Embeddings by Flexible Exploitation of Side Information) on top of the reduction operated with the random projections. The specifically designed metric would alleviate the need to come up with the ad-hoc epsilon of this method.

% Stephane Lafon, “Diffusion maps and geometric harmonics”,

% Ph.D. dissertation, Yale University (May

% 2004)

% written by Cable Kurwitz and Igor Carron

%

% get all the files in jpg

d = dir('buzz*.jpg')

ntotal = size(d,1)

for i=1:ntotal

d(i).name

end

Img = imread(d(1).name);

s=size(Img,1);

w=size(Img,2);

hd=size(Img,3);

swhd=s*w*hd;

%reshape these images into one vector each

for ii=1:ntotal

Img = imread(d(ii).name);

b1(:,ii) = double(reshape(Img, swhd,1));

end

%

% Compressed sensing part

%sa=randn(20,swhd);

%b=sa*b1;

b=b1;

%

%

for i=1:ntotal

for j=1:ntotal

D1(i,j)=norm(b(:,i)-b(:,j),2);

end

end

%

n1=ntotal;

D_sum=0;

for l = 1:n1;

D_sum = D_sum + min(nonzeros(D1(l,:)));

end;

epsilon = D_sum/n1/10; %

%

% Step 1

%

kernel_1 = exp(-(D1./epsilon).^1);

%

% Step 2

%

one_V = ones(1,n1);

%

% Step 3

%

p = kernel_1*one_V';

kernel_2 = kernel_1./((p*p').^1);

%

% Step 4

%

v = sqrt(kernel_2*one_V');

%

% Step 5

%

K = kernel_2./(v*v');

%

% Step 6

%

%% Singular Value Decomposition

[U S V] = svd(K);

% Normalization

for i=1:ntotal-1

phi(:,i) = U(:,i)./U(:,1); %

end

eig_funct = phi(:,1:6);

%

% The eigenvalues of \delta are approximated by those of K, and its

% eigenfunctions Ái are approximated by phi(:,i) = U(:; i):=U(:; 1)

%

vv=eig(K);

% Initial order of Buzz images

figure(1)

for i=1:ntotal

subplot(3,4,i)

imshow(d(i).name)

end

X = vv(ntotal-1).*phi(:,2);

% sorting the Buzz images using the first

% eigenfunctions the Laplace Beltrami operator

[Y,I] = sort(X);

% d(I).name

%

figure(2)

for i=1:ntotal

subplot(3,4,i)

imshow(d(I(i)).name)

end

%

figure(3)

plot(vv(ntotal-1).*phi(:,2),'.')

title(' Buzz Lightyear')

xlabel('Index')

ylabel('Lambda_1 phi_1')

%

figure(4)

plot(vv(ntotal-1).*phi(:,2),vv(ntotal-2).*phi(:,3),'.')

title(' Buzz Lightyear')

xlabel('Lambda_1 phi_1')

ylabel('Lambda_2 phi_2')

%

figure(5)

plot(eig(K),'*')

xlabel('Index of eigenvalue')

ylabel('Value of eigenvalue')

title(' Eingenvalues of K')

vv=eig(K);

%

The diffusion mapping has been taken to another level by Mauro Maggioni with diffusion wavelets and Sridhar Mahadevan with Value Function Approximation with Diffusion Wavelets and Laplacian Eigenfunctions.

Reference: [1] S. Lafon, “

*”, Ph.D. dissertation, Yale University (May 2004)*

**Diffusion maps and geometric harmonics**

## Thursday, December 13, 2007

### Compressed Sensing: Single-Pixel Imaging via Compressive Sampling, Stable Manifold Embedding

Humans are visual animals, and imaging sensors that extend our reach – cameras – have improved dramatically in recent times thanks to the introduction of CCD and CMOS digital technology. Consumer digital cameras in the mega-pixel range are now ubiquitous thanks to the happy coincidence that the semiconductor material of choice for large-scale electronics integration (silicon) also happens to readily convert photons at visual wavelengths into electrons. On the contrary, imaging at wavelengths where silicon is blind is considerably more complicated, bulky, and expensive. Thus, for comparable resolution, a $500 digital camera for the visible becomes a $50,000 camera for the infrared. In this paper, we present a new approach to building simpler, smaller, and cheaper digital cameras that can operate efficiently across a much broader spectral range than conventional silicon-based cameras. Our approach fuses a new camera architecture based on a digital micromirror device with the new mathematical theory and algorithms of compressive sampling . CS combines sampling and compression into a single nonadaptive linear measurement process [1–4]. Rather than measuring pixel samples of the scene under view, we measure inner products between the scene and a set of test functions. Interestingly, random test functions play a key role, making each measurement a random sum of pixel values taken across the entire image. When the scene under view is compressible by an algorithm like JPEG or JPEG2000, the CS theory enables us to stably reconstruct an image of the scene from fewer measurements than the number of reconstructed pixels. In this manner we achieve sub-Nyquist image acquisition. Our “single-pixel” CS camera architecture is basically an optical computer (comprising a DMD, two lenses, a single photon detector, and an analog-to-digital (A/D) converter) that computes random linear measurements of the scene under view. The image is then recovered or processed from the measurements by a digital computer. The camera design reduces the required size, complexity, and cost of the photon detector array down to a single unit, which enables the use of exotic detectors that would be impossible in a conventional digital camera. The random CS measurements also enable a tradeoff between space and time during image acquisition. Finally, since the camera compresses as it images, it has the capability to efficiently and scalably handle high-dimensional data sets from applications like video and hyperspectral imaging. This article is organized as follows. After describing the hardware, theory, and algorithms of the single-pixel camera in detail, we analyze its theoretical and practical performance and compare it to more conventional cameras based on pixel arrays and raster scanning. We also explain how the camera is information scalable in that its random measurements can be used to directly perform simple image processing tasks, such as target classification, without first reconstructing the underlying imagery. We conclude with a review of related camera architectures and a discussion of ongoing and future work.When I last explained the single pixel camera, I didn't realize the first option is also called the raster scan. The noteworthy result of the trade study compares different current scanning capabilities and that provided by compressed sensing.They look at dynamic range, quantization error, photon counting noise and remind us of the fact that with CS, one can obtain much more light on the sensitive detector (pixel) than conventional cameras and that it should help in low light conditions as presented in the IMA presentation (Multiscale Geometric Analysis).

X-rays could be implemented the same way if the DMD would be able to reflect at these wavelengths, for instance by using a well thought multilayer system for instance. They also give a good summary of the other hardware technologies implementing Compressed Sensing. It looks like I have not missed any (part 1, part 2, part 3, part 4)

In another good summary of the current investigations of Richard Baraniuk's group, I found the following slide most illuminating (as presented in Single-Pixel Imaging via Compressive Sampling,) explaining the geometric reason why L1 works

The same figure could be used to explained the relative better ability of non-convex Lp regularization techniques to do even better than L1. Because the Lp (for p less than 1) ball shape tend to be squashed compared to the L1 ball, it is easier to see that there are more randomly oriented N-M planes anchored on a K-face of the Lp ball which would not interesect with the ball. [Update: Muthu MuthuKrishnan just posted a nice picture of an Lp ball, the picture is due to Petros Boufounos ].

With regards to extension of compressed sensing or compressed sensing ideas, Richard Baraniuk and Mike Wakin [1] have these three illuminating slides where they shows how Random projections can reduce Manifolds of low dimension in high dimensional space or how the condition for which random projection can be used as dimensionality reduction techniques. It shows how CS provides better bounds than the older Whitney's embedding theorem.

The current drawback of this approach (random projections) is the inability to provide some type of parametrization on the manifold. Never say never.

Source: Rice Compressed Sensing Repository.

Reference : [1] R. G. Baraniuk and M. B. Wakin, Random Projections of Smooth Manifolds, to appear in

*Foundations of Computational Mathematics*.

## Tuesday, December 11, 2007

### Complex Systems that Matter: an unfortunate Nuclear reactor shutdown and the Space Shuttle ECO malfunction

*America the Powerless: Facing our Nuclear Energy Dilemma*he was mentioning the important issue on our dramatic reliance on Canadian and other foreign reactors to get medical isotopes ( isotopes of molybdenum, technetium ). Then I read this yesterday about a Nuclear plant shutdown brings hospital delays where it explains that because of a unduly conservative regulatory environment, the shutdown of one single Canadian ( Chalk River nuclear facility ) will delay significantly the wait for medical tests in the U.S. and other places. Let us not kid ourselves, it will have some dramatic consequences on some patients.

## Monday, December 10, 2007

### Compressed Sensing: Blog entries and website updates.

And remember kids, Cassini will flyby Titan on Dec. 20 or ten days from now. It will fly above Titan at an altitude of 970 km above it. Not the 1 million km of this photo:

Photo credit: NASA/JPL/Space Science Institute

### Monday Morning Algorithm Part 6: The Fast Johnson-Lindenstrauss Transform

clear

% Fast Johnson-Lindenstrauss Transform

% as described by Nir Ailon and Bernard Chazelle

% in their paper:

% "Approximate Nearest Neighbors and the Fast Johnson-Lindenstrauss

% Transform"

% The algorithm uses the PHD transform of paragraph 2

% The transform is a k x d projection

% for every set S of n points in R^d

% Let us define an example case

% All columns of G are sets of points

% i.e. we have 1000 points of dimension 128

n = 1000;

d = 256;

% let us examine a reduction from d = 256 to k = 150

%

k = 150;

G = 180 * rand(d,n);

% k = O(epsilon^(-2) * log(n))

epsilon = sqrt(log(n)/k)

%

% epsilon is really O( sqrt(log(n)/k) )

%

%

% Projection in dim k << d % let us take d = 2^w w = log2(d); % % % Defining P (k x d) % first define q % q = min((log(n))^2/d,1); % P = rand(k,d); P = (P < q); P1 = 1/q * randn(k,d); P = P1 .* P; % % Defining H (d x d) H = [1]; for i=1:w H = [ H H;H -H]; end H = 1/d^(0.5)* H; % % Defining D ( d x d) vec1=rand(d,1); v1 = ((vec1 > 0.5) - 1*(vec1<=0.5)); D = diag(v1); % % % FJLT = P * H * D; u = FJLT * G(:,5); v = FJLT * G(:,36); norm(G(:,5)-G(:,36))^2 * k * (1-epsilon) norm( u - v)^2 norm(G(:,5)-G(:,36))^2 * k * (1+epsilon)

## Thursday, December 06, 2007

### Compressed Sensing: Neighborly polytopes, Acoustic Tomography, Sampling in union of linear subspaces, ROMP stability

Recently, I mentioned a paper on Efficient and Stable Acoustic Tomography Using Sparse Reconstruction Methods by Ivan Jovanović, Ali Hormati, Luciano Sbaiz and Martin Vetterli. It is now here. They solve an inverse problem finding a temperature profile by an inference from the speed of sound. In effect they solve an integral equation like Richard Baraniuk and Philippe Steeghs in Compressive radar imaging and Lawrence Carin, Dehong Liu and Ya Xue in In Situ Compressive Sensing

From the Rice repository, two new items:

Thomas Blumensath and Mike Davies, Sampling theorems for signals from the union of linear subspaces.

The abstract reads

Compressed sensing is an emerging signal acquisition technique that enables signals to be sampled well below the Nyquist rate, given that the signal has a sparse representation in an orthonormal basis. In fact, sparsity in an orthonormal basis is only one possible signal model that allows for sampling strategies below the Nyquist rate. In this paper we consider a more general signal model and assume signals that live on or close to the union of linear subspaces of low dimension. We present sampling theorems for this model that are in the same spirit as the Nyquist-Shannon sampling theorem in that they connect the number of required samples to certain model parameters. Contrary to the Nyquist-Shannon sampling theorem, which gives a necessary and sufficient condition for the number of required samples as well as a simple linear algorithm for signal reconstruction, the model studied here is more complex. We therefore concentrate on two aspects of the signal model, the existence of one to one maps to lower dimensional observation spaces and the smoothness of the inverse map. We show that almost all linear maps are one to one when the observation space is at least twice the dimension of the largest subspace in the union of subspaces. However, we also show that in order for the inverse map to have certain smoothness properties such as a given finite Lipschitz constant, the required observation dimension necessarily depends on the number of subspaces in the signal model. These results are then applied to two examples, the standard compressed sensing signal model in which the signal has a sparse representation in an orthonormal basis and to a sparse signal model with additional tree structure.

and Nam H. Nguyen and Trac D. Tran, The stability of regularized orthogonal matching pursuit. The abstract reads:

This paper studies a fundamental problem that arises in sparse representation and compressed sensing community: can greedy algorithms give us a stable recovery from incomplete and contaminated observations ? Using the Regularized Orthogonal Matching Pursuit (ROMP) algorithm, a modified version of Orthogonal Matching Pursuit (OMP) [1], which was recently introduced by Deanna Needell and Roman Vershynin [ref: Signal recovery from incomplete and inaccurate measurements via Regularized Orthogonal Matching Pursuit ], we assert that ROMP is stable and guarantees approximate recovery of non-sparse signals, as good as the Basis Pursuit algorithm [16]. We also will set up criterions at which the algorithm halts, and the upper bounds for the reconstructed error. It will be proved in the paper that these upper bounds are proportional to the noise energy.Credit Photo: NASA/JPL/Space Science Institute, Saturn moon Tethys taken on October 29, 2007.

## Wednesday, December 05, 2007

### Compressed Sensing: Solving the Basis Pursuit problem with a Bregman Iterative Algorithm

We propose simple and extremely efficient methods for solving the Basis Pursuit problem

which is used in compressed sensing. Our methods are based on Bregman iterative regularization and they give a very accurate solution after solving only a very small number of instances of the unconstrained problem

for given matrix A and vector fk. We show analytically that this iterative approach yields exact solutions in a finite number of steps, and present numerical results that demonstrate that as few as two to six iterations are sufficient in most cases. Our approach is especially useful for many compressed sensing applications where matrix-vector operations involving A and A' can be computed by fast transforms. Utilizing a fast fixed-point continuation solver that is solely based on such operations for solving the above unconstrained sub-problem, we were able to solve huge instances of compressed sensing problems quickly on a standard PC.

They also seem to aim for very large reconstruction problems as well. The webpage for this effort is here and the code is here. The main difference would certainly be in how much time is spent on both algorithms to solve the same problem but also the fact that the TwIST would have the advantage of also dealing with non-convex regularization lp terms with p =0.7 or 0.8. If anybody has done the comparison between the two methods with p=1, I'd love to hear about it.

Source: Rice Compressed Sensing page.

### Compressed Sensing: A new TwIST

References that explain the algorithms are A new TwIST: two-step iterative shrinkage/thresholding algorithms for image restoration and Two-step algorithms for linear inverse problems with non-quadratic regularization" both by José Bioucas-Dias, Mário Figueiredo. MATLAB code is available here.

Source: Rice Compressed Sensing page.

## Tuesday, December 04, 2007

### Compressed Sensing: Random Undersampling in Geophysics and ROMP, a stable greedy algorithm

In this paper, we present a new discrete undersampling scheme designed to favor wavefield reconstruction by sparsity-promoting inversion with transform elements that are localized in the Fourier domain. Our work is motivated by empirical observations in the seismic community, corroborated by recent results from compressive sampling, which indicate favorable (wavefield) reconstructions from random as opposed to regular undersampling. As predicted by theory, random undersampling renders coherent aliases into harmless incoherent random noise, effectively turning the interpolation problem into a much simpler denoising problem.

A practical requirement of wavefield reconstruction with localized sparsifying transforms is the control on the maximum gap size. Unfortunately, random undersampling does not provide such a control and the main purpose of this paper is to introduce a sampling scheme, coined jittered undersampling, that shares the benefits of random sampling, while offering control on the maximum gap size. Our contribution of jittered sub-Nyquist sampling proofs to be key in the formulation of a versatile wavefield sparsity-promoting recovery scheme that follows the principles of compressive sampling.

After studying the behavior of the jittered-undersampling scheme in the Fourier domain, its performance is studied for curvelet recovery by sparsity-promoting inversion (CRSI). Our findings on synthetic and real seismic data indicate an improvement of several decibels over recovery from regularly-undersampled data for the same amount of data collected.

Deanna Needell and Roman Vershynin just made available a paper entitled:Signal recovery from incomplete and inaccurate measurements via Regularized Orthogonal Matching Pursuit. Roman Vershynin writes the following commentary on it:

This is our second paper with my student Deanna Needell, where we continue to study alternatives to linear programming methods in Compressed Sensing. In our first paper, we developed a greedy algorihtm called ROMP, which can perfectly recover any sparse signal from a small number of linear non-adaptive measurements. In this paper, we prove the stability of ROMP. It seems to be the first greedy algorithm in Compressed Sensing that is provably stable in the sense that the recovery error is proportional to the level of the noise. In particular, the recovery is exact in the absence of noise. Also, one can use ROMP to accurately recover signals, not necessarily sparse. This kind of stability was known for the linear programming methods (a result of Candes, Romberg and Tao), but has been previously unknown for any greedy method (like OMP).

ROMP is available here. Let us note that like the reweighted Lp algorithm of Rick Chartrand and Wotao Yin, reconstruction is performed by a series of iterations solving a Least Squares problem.