Nonsmooth Nonnegative Matrix Factorization ns NMF Alberto PascualMontano

Nonsmooth Nonnegative Matrix Factorization (ns. NMF) Alberto Pascual-Montano, Member, IEEE, J. M. Carazo, Senior Member, IEEE, Kieko Kochi, Dietrich Lehmann, and Roberto D. Pascual-Marqui 2006, IEEE Presenter : 張庭豪

2 Outline INTRODUCTION REVIEW OF NMF AND ITS SPARSE VARIANTS NONSMOOTH NMF (ns. NMF) EXPERIMENTS CONCLUSIONS AND DISCUSSION

3 INTRODUCTION Nonnegative matrix factorization (NMF) has been introduced as a matrix factorization technique that produces a useful decomposition in the analysis of data. This method results in a reduced representation of the original data that can be seen either as a feature extraction or a dimensionality reduction technique. More importantly, NMF can be interpreted as a parts-based representation of the data due to the fact that only additive, not subtractive, combinations are allowed.

4 INTRODUCTION

5 INTRODUCTION In fact, taking a closer look at the basis and encoding vectors produced by NMF, it is noticeable that there is a high degree of overlapping among basis vectors that contradict the intuitive nature of the “parts”. In this sense, a matrix factorization technique capable of producing more localized, less overlapped feature representations of the data is highly desirable in many applications. The new method, here referred to as Nonsmooth Nonnegative Matrix Factorization (ns. NMF), differs from the original in the use of an extra smoothness matrix for imposing sparseness. The goal of ns. NMF is to find sparse structures in the basis functions that explain the data set.

REVIEW OF NONNEGATIVE MATRIX FACTORIZATION (NMF) AND ITS SPARSE VARIANTS 6

REVIEW OF NONNEGATIVE MATRIX FACTORIZATION (NMF) AND ITS SPARSE VARIANTS 7 Taking the derivative with respect to H gives: The gradient algorithm then states: for some step size .

REVIEW OF NONNEGATIVE MATRIX FACTORIZATION (NMF) AND ITS SPARSE VARIANTS 8 Forcing: gives the multiplicative rule:

REVIEW OF NONNEGATIVE MATRIX FACTORIZATION (NMF) AND ITS SPARSE VARIANTS 9 Taking the derivative with respect to W gives: The gradient algorithm then states: Forcing the step size: gives:

REVIEW OF NONNEGATIVE MATRIX FACTORIZATION (NMF) AND ITS SPARSE VARIANTS 10

REVIEW OF NONNEGATIVE MATRIX FACTORIZATION (NMF) AND ITS SPARSE VARIANTS 11

REVIEW OF NONNEGATIVE MATRIX FACTORIZATION (NMF) AND ITS SPARSE VARIANTS 12

REVIEW OF NONNEGATIVE MATRIX FACTORIZATION (NMF) AND ITS SPARSE VARIANTS 13

REVIEW OF NONNEGATIVE MATRIX FACTORIZATION (NMF) AND ITS SPARSE VARIANTS 14 Nonnegative Sparse Coding (NNSC) Similar to the LNMF algorithm, the Nonnegative Sparse Coding (NNSC) method is intended to decompose multivariate data into a set of positive sparse components Combining a small reconstruction error with a sparseness criterion, the objective function is: where the form of f defines how sparseness on H is measured and controls the trade-off between sparseness and the accuracy of the reconstruction.

REVIEW OF NONNEGATIVE MATRIX FACTORIZATION (NMF) AND ITS SPARSE VARIANTS 15

OUR PROPOSAL: NONSMOOTH NONNEGATIVE MATRIX FACTORIZATION (ns. NMF) 16 Because of the multiplicative nature of the model, i. e. , “basis” multiplied by “encoding, ” sparseness in one of the factors will almost certainly force “nonsparseness” or smoothness in the other. On the other hand, forcing sparseness constraints on both the basis and the encoding vectors will deteriorate the goodness of fit of the model to the data. Therefore, from the outset, this approach is doomed to failure in achieving generalized sparseness and satisfactory goodness of fit.

OUR PROPOSAL: NONSMOOTH NONNEGATIVE MATRIX FACTORIZATION (ns. NMF) 17

OUR PROPOSAL: NONSMOOTH NONNEGATIVE MATRIX FACTORIZATION (ns. NMF) 18

OUR PROPOSAL: NONSMOOTH NONNEGATIVE MATRIX FACTORIZATION (ns. NMF) 19

EXPERIMENTS 20 As mentioned in the previous section, the multiplicative nature of the sparse variants of the NMF model will produce a paradoxical effect: Imposing sparseness in one of the factors will almost certainly force smoothness in the other in an attempt to reproduce the data as best as possible. Additionally, forcing sparseness constraints on both the basis and the encoding vectors will decrease the explained variance of the data by the model. Table 1 shows the results when using exactly three factors in all cases. Different NMF-type methods were applied to the same randomly generated positive data set (5 variables, 20 items, rank = 3).

EXPERIMENTS 21

EXPERIMENTS 22 basis “swimmer” data set • NMF failed in extracting the 16 limbs and the torso, while ns. NMF successfully explained the data using one factor for each independent part. • NMF extract parts of the data in a more holistic manner, while ns. NMF sparsely represents the same reality.

EXPERIMENTS 23 CBCL faces data set 49 basis Not sparse

EXPERIMENTS 24 NMF Results Fig. 3(a) shows the results using the Lee and Seung algorithm applied to the facial database using 49 factors. Even if the factors’ images give an intuitive notion of a parts-based representation of the original faces, the factorization is not really sparse enough to represent unique parts of an average face. In other words, the NMF algorithm allows some undesirable overlapping of parts, especially in those areas that are common to most of the faces in the input data.

EXPERIMENTS 25 ns. NMF Results 0. 5 83. 84 0. 6 80. 69 0. 7 78. 17 0. 8 76. 44

EXPERIMENTS 26 0. 5 0. 6 0. 7 0. 8

CONCLUSIONS AND DISCUSSION 27 The approach presented here is an attempt to improve the ability of the classical NMF algorithm in this process by producing truly sparse components of the data structure. The experimental results on both synthetic data and real data sets have shown that the ns. NMF algorithm outperformed the existing sparse NMF variants in performing parts-based representation of the data while maintaining the goodness of fit.