Nuit Blanche: CS: GPR, Primal Dual Pursuit, Dictionary building, Uncertainty Relations, Edge Detection, Minimum Sum of Distances Estimator, NIPS'08 workshop

Wednesday, September 24, 2008

CS: GPR, Primal Dual Pursuit, Dictionary building, Uncertainty Relations, Edge Detection, Minimum Sum of Distances Estimator, NIPS'08 workshop

Today we have a flurry of new papers coming out and each of them should be of interest to some of the readership of this blog:

Three Novel Edge Detection Methods for Incomplete and Noisy Spectral Data by Eitan Tadmor, Jing Zou. The abstract reads:

We propose three novel methods for recovering edges in piecewise smooth
functions from their possibly incomplete and noisy spectral information. The proposed methods utilize three different approaches: #1. The randomly-based sparse Inverse Fast Fourier Transform (sIFT); #2. The Total Variation-based (TV) compressed sensing; and #3. The modified zero crossing. The different approaches share a common feature: edges are identified through separation of scales. To this end, we advocate here the use of concentration kernels (Tadmor, Acta Numer. 16:305–378, 2007), to convert the global spectral data into an approximate jump function which is localized in the immediate neighborhoods of the edges. Building on these concentration kernels, we show that the sIFT method, the TV-based compressed sensing and the zero crossing yield effective edge detectors, where finitely many jump discontinuities are accurately recovered. One- and two-dimensional numerical results are presented.

Minimum Sum of Distances Estimator: Robustness and Stability by Yoav Sharon, John Wright, and Yi Ma. The abstract reads

We consider the problem of estimating a state x from noisy and corrupted linear measurements y = Ax + z + e, where z is a dense vector of small-magnitude noise and e is a relatively sparse vector whose entries can be arbitrarily large. We study the behavior of the l1 estimator x^ = argmin_x ||y - Ax||_1, and analyze its breakdown point with respect to the number of corrupted measurements ||e||_0. We show that under mild conditions, the breakdown point does not depend on the noise level. We introduce a novel algorithm for computing the breakdown point for any given A, and provide a simple bound on the estimation error when the number of corrupted measurements is less than the breakdown point. We apply our algorithm to design a robust state estimator for an autonomous vehicles, and show how it can significantly improve performance over the Kalman filter.

This is an interesting application that would have served well our entry in DARPA Grand Challenge.

We have also two dissertations from Georgia Tech under the supervision of Justin Romberg

The first one is by Ali Cafer Gurbuz in a dissertation entitled Compressive sensing Ground penetrating radar Subsurface imaging. The abstract reads:

The problem of sensing a medium by several sensors and retrieving interesting features is a very general one. The basic framework of the problem is generally the same for applications from MRI, tomography, Radar SAR imaging to subsurface imaging, even though the data acquisition processes, sensing geometries and sensed properties are different. In this thesis we introduced a new perspective to the problem of remote sensing and information retrieval by studying the problem of subsurface imaging using GPR and seismic sensors. We have shown that if the sensed medium is sparse in some domain then it can be imaged using many fewer measurements than required by the standard methods. This leads to much lower data acquisition times and better images representing the medium. We have used the ideas from Compressive Sensing, which show that a small number of random measurements about a signal is sufficient to completely characterize it, if the signal is sparse or compressible in some domain. Although we have applied our ideas to the subsurface imaging problem, our results are general and can be extended to other remote sensing applications. A second objective in remote sensing is information retrieval which involves searching for important features in the computed image of the medium. In this thesis we focus on detecting buried structures like pipes, and tunnels in computed GPR or seismic images. The problem of finding these structures in high clutter and noise conditions, and finding them faster than the standard shape detecting methods like the Hough transform is analyzed. One of the most important contributions of this thesis is, where the sensing and the information retrieval stages are unified in a single framework using compressive sensing. Instead of taking lots of standard measurements to compute the image of the medium and search the necessary information in the computed image, a much smaller number of measurements as random projections are taken. The data acquisition and information retrieval stages are unified by using a data model dictionary that connects the information to the sensor data.

a subject similar to that treated by Andriyan Suksmono.

And then there is the dissertation of Muhammad Salman Asif entitled:Primal Dual Pursuit: A homotopy based algorithm for the Dantzig selector. The abstract reads:

Consider the following system model y = Ax + e, where x is n-dimensional sparse signal, y is the measurement vector in a much lower dimension m, A is the measurement matrix and e is the error in our measurements. The Dantzig selector estimates x by solving the following optimization problem minimize || x ||₁ subject to || A'(Ax - y) ||∞ \le ε, (DS). This is a convex program and can be recast into a linear program and solved using any modern optimization method e.g., interior point methods. We propose a fast and efficient scheme for solving the Dantzig Selector (DS), which we call "Primal-Dual pursuit". This algorithm can be thought of as a "primal-dual homotopy" approach to solve the Dantzig selector (DS). It computes the solution to (DS) for a range of successively relaxed problems, by starting with a large artificial ε and moving towards the desired value. Our algorithm iteratively updates the primal and dual supports as ε reduces to the desired value, which gives final solution. The homotopy path solution of (DS) takes with varying ε is piecewise linear. At some critical values of ε in this path, either some new elements enter the support of the signal or some existing elements leave the support. We derive the optimality and feasibility conditions which are used to update the solutions at these critical points. We also present a detailed analysis of primal-dual pursuit for sparse signals in noiseless case. We show that if our signal is S-sparse, then we can find all its S elements in exactly S steps using about "S² log n" random measurements, with very high probability.

We also have methods for building sparse dictionary for two groups. In the context of Compressive Sensing, these constructions should help in reconstructing signals:

Supervised Dictionary Learning by Julien Mairal, Francis Bach, Jean Ponce, Guillermo Sapiro, and Andrew Zisserman. The abstract reads:

It is now well established that sparse signal models are well suited to restoration tasks and can effectively be learned from audio, image, and video data. Recent research has been aimed at learning discriminative sparse models instead of purely reconstructive ones. This paper proposes a new step in that direction, with a novel sparse representation for signals belonging to different classes in terms of a shared dictionary and multiple class-decision functions. The linear variant of the proposed model admits a simple probabilistic interpretation, while its most general variant admits an interpretation in terms of kernels. An optimization framework for learning all the components of the proposed model is presented, along with experimental results on standard handwritten digit and texture classification tasks.

and Dictionary Learning for Sparse Approximations with the Majorization Method by Mehrdad Yaghoobi-Vaighan, Thomas Blumensath and Mike Davies. The abstract reads:

Sparse approximation methods can be used successfully where an appropriate generative model for compressible signals is known. If the model is unknown, it can be adapted by using a set of training samples. This paper presents a novel method for dictionary learning and extends the learning problem by introducing different constraints on the dictionary. The convergence of the proposed method to a fixed point, or the accumulation points forming a continuum, is guaranteed and holds for different sparsity measures. The majorization method is an optimization method that substitutes the original objective function with a surrogate function that is updated in each optimization step. This method has been used successfully in sparse approximation and statistical estimation (e.g. Expectation Maximization (EM)) problems. This paper shows that the majorization method can be used for the dictionary learning problem too. The proposed method is compared with other methods on both synthetic and real data and different constraints on the dictionary are compared. Simulations show the advantages of the proposed method over other currently available dictionary learning methods not only in terms of average performance but also in terms of computation time.

In a similar vein, when one wants to decompose a signal from dictionaries of functions that are incoherent, Yonina Eldar just released Uncertainty Relations for Analog Signals on ArXiv. The abstract reads:

In the past several years there has been a surge of research investigating various aspects of sparse representations and compressed sensing. Most of this work has focused on the finite-dimensional setting in which the goal is to decompose a finite-length vector into a given finite dictionary. Underlying many of these results is the conceptual notion of an uncertainty principle: a signal cannot be sparsely represented in two different bases. Here, we extend these ideas and results to the analog, infinite-dimensional setting by considering signals that lie in a finitely-generated shift-invariant (SI) space. This class of signals is rich enough to include many interesting special cases such as multiband signals and splines. By adapting the notion of coherence defined for finite dictionaries to infinite SI representations, we develop an uncertainty principle similar in spirit to its finite counterpart. We demonstrate tightness of our bound by considering a bandlimited low-pass comb that achieves the uncertainty principle. Building upon these results and similar work in the finite setting, we show how to find a sparse decomposition in an overcomplete dictionary by solving a convex optimization problem. The distinguishing feature of our approach is the fact that even though the problem is defined over an infinite domain with infinitely many variables and constraints, under certain conditions on the dictionary spectrum our algorithm can find the sparsest representation by solving a finite dimensional problem.

Finally, a workshop on l1 techniques in Machine Learning and added to the CS calendar.

Optimization for Machine Learning
NIPS*2008 Workshop
December 12-13, 2008, Whistler, Canada

URL: http://opt2008.kyb.tuebingen.mpg.de/

Deadline for submission of papers: 17th October 2008
Notification of acceptance: 7th November 2008
Final version of submission: 20th November 2008
Workshop date: 12th or 13th December 2008

Abstract
--------
Classical optimization techniques have found widespread use in machine learning. Convex optimization has occupied the center-stage and significant effort continues to be still devoted to it. New problems constantly emerge in machine learning, e.g., structured learning and semi-supervised learning, while at the same time fundamental problems such as clustering and classification continue to be better understood. Moreover, machine learning is now very important for real-world problems with massive datasets, streaming inputs, the need for distributed computation, and complex models. These challenging characteristics of modern problems and datasets indicate that we must go beyond the "traditional optimization" approaches common in machine learning.
What is needed is optimization "tuned" for machine learning tasks. For example, techniques such as non-convex optimization (for semi-supervised learning, sparsity constraints), combinatorial optimization and relaxations (structured learning), stochastic optimization (massive datasets), decomposition techniques (parallel and distributed computation), and online learning (streaming inputs) are relevant in this setting. These techniques naturally draw inspiration from other fields, such as operations research, polyhedral combinatorics, theoretical computer science, and the optimization community.
Motivated by these concerns, we would like to address these issues in the framework of this workshop.
Background and Objectives
-------------------------
The closest in spirit to our workshop are the previously held workshops on 'Mathematical Programming in Machine Learning / Data Mining' from 2005--2007.
These workshops were quite extensive and provided a solid platform for encouraging exchange between machine learners and optimization researchers. Another relevant workshop was the BigML NIPS*2007 workshop that focused on algorithmic challeges faced for large-scale machine learning tasks, with a focus on parallelization or online learning.
Our workshop addresses the following major issues, some of which have not been previously tackled as a combined optimization and machine learning effort. In particular, the aim of the workshop is to:
+ Bring together experts from machine learning, optimization, operations research, and statistics
+ Focus on problems of interest to the NIPS audience (some basic examples are given below)
+ Identify a set of important open problems and issues that lie at the intersection of both machine learning and optimization
Call for Participation
----------------------
We invite high quality submissions for presentation as talks or poster presentations during the workshop. We are especially interested in participants who can contribute in the following areas:
* Non-Convex Optimization, example problems in ML include
- Problems with sparsity constraints
- Sparse PCA
- Non-negative matrix and tensor approximation
- Non-convex quadratic programming
* Combinatorial Optimization, example problems in ML include
- Estimating MAP solutions to discrete random fields
- Clustering and graph-partitioning
- Semi-supervised and multiple-instance learning
- Feature and subspace selection
* Stochastic, Parallel and Online Optimization, example problems in ML include
- Massive data sets
- Distributed learning algorithms
* Algorithms and Techniques, especially with a focus on an underlying application
- Polyhedral combinatorics, polytopes and strong valid inequalities
- Linear and higher-order relaxations
- Semidefinite programming relaxations
- Decomposition for large-scale, message-passing and online learning
- Global and Lipschitz optimization
- Algorithms for non-smooth optimization
- Approximation Algorithms
*Note: Generic methods such as neural-networks, simulated annealing, swarm-optimization methods (ant-colony optimization, genetic algorithms), lie outside the scope of this workshop.