Nuit Blanche: Optimization with multiple non-standard regularizers

Monday, December 05, 2011

Optimization with multiple non-standard regularizers

Following up on yesterday's entry on the potential relationship between multiple regularizers and other than sparse solutions, here is an instructive conversation that occurred on the LinkedIn Compressive Sensing discussion group a while back. Before featuring it, let me say that I look forward to any type of experience (including bad ones) anybody has with different and multiple regularizers. Without further due:

Son Hua asked the following question:

Optimization with multiple non-standard regularizers

I'm solving the problem | y - Ax |_2^2 + a | Cx |_1 + b | Dx |_1, where C and D may not be orthogonal. Currently I used NESTA, which can handle the form | y - Ax |_2^2 + lambda | Wx |_1 (by setting lambda to 1 and stack a*C and b*D together).

I wonder if there exists any other solvers that can handle my problem? I know that block coordinate descent (alternating optimization) would be able to solve my problem, but I guess block coordinate descent approach is far more slower than NESTA. Can FISTA be modified to solve my problem? Thanks.

Mateusz Malinowski first responded with:

You can use either Augmented Lagrangian or modified TV-Fista. I used both algorithms in my setting which was similar to yours and both methods works well. You can also modified your regularizers to be |s|_e = sqrt( s^2 + e), and use Nonlinear Conjugate Gradient method. If you are interested I can send you the code and my thesis where I did some experiments with these three methods.

Jerome Gilles then added:

You can use an Alternating Method (AM) + Bregman Iteration to solve this problem. See the papers of Jian-Feng Cai (available on the CAM reports of UCLA - Department of Mathematics). They solve exactly the same problem where D is a framelet decomposition operator and C is the curvelet decomposition operator.

Stephen Becker also replied:

yet another option is TFOCS (http://tfocs.stanford.edu)

Bo Zhang added:

Or you can also consider using proximal splitting:http://www.ann.jussieu.fr/~plc/inv2.pdf
Very similar to Alternating Method.

Son Hua then provided additional information:

I ended up using split Bregman iteration to solve my problem (a different name is augmented Lagrangian I think). New variables c = Cx and d = Dx are introduced and the problem is turned into | y - Ax |_2^2 + a | c |_1 + b | d |_1 + lambda | c - Cx |^2 + theta | d - Dx |^2. Now split Bregman method can be applied to find the optimal x.

I found that it is quite tricky to set the "lambda" and "theta" parameter which controls how strict we want to enforce the original constraint | Cx |_1 and | Dx |_1. For my problem I set lambda proportional to a and theta proportional to b.

Since these parameters are introduced by the split Bregman method, I believe there should have a systematic way to set their values. Does anyone have experience or has a better way in setting these parameters? Thanks.

Jerome Gilles

Split Bregman is not only an augmented Lagrangian. In fact if the equation you gave are the one you really use, it is NOT a split Bregman implementation! A true split Bregman is
| y - Ax |_2^2 + a | c |_1 + b | d |_1 + lambda | c - Cx-bc |^2 + theta | d - Dx -bd|^2 where the terms bc and bd are also updated in the loop. These terms are very important see the paper of Goldstein and Osher.
Now concerning the parameters lambda and theta they goal is to ensure that |c-Cx-bc| and |d-Dx-bd| are very small. Now as you minimize this functional, the choice of lambda and theta counter-balance the behavior of these terms and consequently they may not be small. In my experience, setting 1<=lambda=theta<=10 generally work.
If you are interested in Split Bregman, you can take a look in my Matlab toolbox called "the Bregman Cookbook" available at http://www.math.ucla.edu/~jegilles....

Son Hua then replied:

Thanks Jerome. You are right; my formula above is not split Bregman. But I did implement split Bregman in my code.

What is your motivation when choosing 1 <= lambda, theta <= 10? What happens if we choose them to be smaller or larger? Does it depend on the problem you solve?

Jerome Gilles finally responded:

It's a kind of tradeoff between having c=Cx+bc and d=Dx+bd and the speed of convergence. These values just came from my own experience on several images.

Thanks Son, Jerome, Bo, Stephen and Mateusz.

Image Credit: NASA/JPL/Space Science Institute, N00178555.jpg was taken on December 03, 2011 and received on Earth December 04, 2011. The camera was pointing toward TETHYS at approximately 2,219,642 kilometers away, and the image was taken using the CL1 and CL2 filters. This image has not been validated or calibrated.