Thursday, November 07, 2013

The Block-Coordinate Update method for Advanced Matrix Factorization - implementation -





Here are some important implementations I should have featured a long time ago from Yangyang Xu and Wotao Yin. From the page:

The block-coordinate update method

Yangyang Xu and Wotao Yin

Rice University CAAM Technical Report TR12-15

Background

The block-coordinate update (BCU) method is a generalization to the following classic methods:
  • alternating minimization (of a function in the form of f(x_1,x_2)),
  • alternating projection (to find a point in the intersection of two convex sets mathcal{C}_1 and mathcal{C}_2 by alternatively projecting to mathcal{C}_1 and mathcal{C}_2),
  • (block) coordinate minimization (of a function in the form of f(x_1,x_2,ldots,x_s)),
  • (block) coordinate descent (of a function in the form of f(x_1,x_2,ldots,x_s)).
BCU solves the problem in the form of
 mathop{mathrm{minimize}}_{mathbf{x}_1,ldots,mathbf{x}_s} F(mathbf{x}_1,ldots,mathbf{x}_s)~mbox{subject to}~(mathbf{x}_1,ldots,mathbf{x}_s)inmathcal{X}  
by updating just one or a few blocks of variables at a time, rather than updating all the blocks together (the batch update). The order of update can be deterministic or stochastic. The deterministic orders can be eithr cyclic or greedy according to a certain rank.
The main advantage is that updating one or just a few blocks of variables are computationally much cheaper than the batch update. On the other hand, convergence requires more stringent conditions and typically takes more iterations.
The update applied to each block can be exact minimization over the block or take different forms of inexact updates such as
  • one or a few gradient descent steps,
  • one or a few projected gradient descent steps,
  • one or a few (preconditioned) CG steps,
  • prox-linear update,
  • more…
There is a tradeoff between the per-update complexity and the progress of overall minimization.

Motivation and the Proposed Method

It is challenging to establish the global convergence of BCU for optimization problems that are nonconvex and/or nonsmooth. In general, either nonconvexity or nonsmoothness can cause BCU to stagnate at a non-stationary point.
To establish global convergence, we assume a block multi-convex structure, namely, the feasible set and objective function are convex in each block of variables (but not jointly convex in general). Nonsmoothness is permitted within each block. Many interesting applications have this structure; see below. We propose a BCU algorithm with three different block-update schemes; the choice for each block is independent of others. Under certain conditions, we show that any limit point satisfies the Nash equilibrium conditions (a generalization to stationarity). Furthermore, global convergence and asymptotic convergence rate are established for problems obeying the Kurdyka-Lojasiewicz inequality.

Applications

  • Nonnegative matrix/tensor factorization
  • Nonnegative matrix/tensor completion (reconstruction from incomplete observations)
  • Hyperspectral data analysis
  • Sparse dictioanry learning
  • Blind source separation
  • Any multi-convex problems, where the feasible set and objective function are generally non-convex but convex in each block of variables.

Tested problem sets

  • Synthetic nonnegative matrices (factorization / completion)
  • Synthetic nonnegative tensor (factorization / completion)
  • CBCL and ORL image databases
  • Hyperspectral data

Citation

Y. Xu and W. Yin, A Block Coordinate Descent Method for Multi-Convex Optimization with Applications to Nonnegative Tensor Factorization and Completion, Rice CAAM technical report 12-15, 2012. To appear inSIAM Journal on Imaging Sciences.

Matlab codes and demos

No comments:

Printfriendly