The Group Lasso for Logistic Regression Lukas Meier

Outline • From lasso to group lasso • logistic group lasso • Algorithms for

Lasso A popular model selection and shrinkage estimation method. In a linear regression set-up:

Group Lasso In some cases not only continuous but also categorical predictors (factors) are

Connection Consider a case: two factors and Observe the contour of the penalty function:

Logistic Group Lasso Independent and identically distributed observations : p-dimensional vector of : a

Logistic Group Lasso controls the amount of penalization rescale the penalty with respect to

Optimization Algorithms 1. Block co-ordinate descent 2. Cycle through the parameter groups and minimize

Optimization Algorithms • • : set to while all other components remain unchanged the

Optimization Algorithms Armijo rule: an inexact line search, let so that be the largest

Optimization Algorithms • Minimization with respect to the th parameter group depends on only

Hybrid Methods • Logistic group lasso-ridge hybrid The models selected by the group lasso

Simulation First sample distribution instances of a nine-dim multivariate normal with mean 0 and

Observations: The group lasso seems to select unnecessarily large models with many noise variables;

Application Experiment Splice sites: the regions between coding (exons) and noncoding (introns) DNA segments.

The best model with respect to the log-likelihood score on the validation set is

Conclusions • Study the group lasso for logistic regression • Present efficient algorithm (automatic

Slides: 19

Download presentation

The Group Lasso for Logistic Regression Lukas Meier, Sara van de Geer and Peter Bühlmann Presenter: Lu Ren ECE Dept. , Duke University Sept. 19, 2008

Outline • From lasso to group lasso • logistic group lasso • Algorithms for the logistic group lasso • Logistic group lasso-ridge hybrid • Simulation and application to splice site detection • Discussion

Lasso A popular model selection and shrinkage estimation method. In a linear regression set-up: • : continuous response • : design matrix • : parameter vector The lasso estimator is then defined as: where , and larger set some exactly to 0.

Group Lasso In some cases not only continuous but also categorical predictors (factors) are present, the lasso solution is not satisfactory with only selecting individual dummy variables but the whole factor. Extended from the lasso penalty, the group lasso estimator is: : the index set belonging to the th group of variables. The penalty does the variable selection at the group level , belonging to the intermediate between and type penalty. It encourages that either or for all

Connection Consider a case: two factors and Observe the contour of the penalty function: -penalty treats the three co-ordinate directions differently: encourage sparsity in individual coefficients while -penalty treats all directions equally and does not encourage sparsity. Ref: Ming Yuan and Yi Lin, Model selection and estimation in regression with grouped variables, J. R. Statist. , 2008

Logistic Group Lasso Independent and identically distributed observations : p-dimensional vector of : a binary response variable, predictors : feedom degree The conditional probability with The estimator is given by the minimizer of the convex function:

Logistic Group Lasso controls the amount of penalization rescale the penalty with respect to the dimensionality of

Optimization Algorithms 1. Block co-ordinate descent 2. Cycle through the parameter groups and minimize the object function , keeping all except the current group fixed.

Optimization Algorithms • • : set to while all other components remain unchanged the parameter vector after block updates, and it can be shown every limit point of the sequence is a minimum point of • blockwise minimizations of the active groups must be performed numerically, and sufficiently fast for small group size and dimension. 2. Block co-ordinate gradient descent Combine a quadratic approximation of the log-likelihood with an additional line search:

Optimization Algorithms Armijo rule: an inexact line search, let so that be the largest value in

Optimization Algorithms • Minimization with respect to the th parameter group depends on only , here define . A proper choice is where is a lower bound to ensure convergence. • To calculate the we can start at We use until on a grid of the penalty parameter as a starting value for and proceed iteratively with equal or close to 0.

Hybrid Methods • Logistic group lasso-ridge hybrid The models selected by the group lasso are large compared with the underlying true models; The ordinary lasso can obtain good prediction with smaller models by using lasso with relaxation. Define the index set of predictors selected by the group lasso with , and is the set of possible parameter vectors of the corresponding submodel. The group lasso-ridge hybrid estimator: is a special case called the group lasso-MLE hybrid

Simulation First sample distribution instances of a nine-dim multivariate normal with mean 0 and covariance matrix Each is transformed into a four-valued categorical variable by using the quartiles of the standard normal so that Simulate independent standard normal Four different cases are studied: and

Observations: The group lasso seems to select unnecessarily large models with many noise variables; The group lasso-MLE hybrid is very conservative in selecting terms; The group lasso-ridge hybrid seems to be the best compromise and has the best prediction performance in terms of the loglikelihood score.

Application Experiment Splice sites: the regions between coding (exons) and noncoding (introns) DNA segments. Two training data set: 5610 true and 5610 false donor sites 2805 true and 59804 false donor sites Test sets: 4208 true and 89717 false donor sites. For a threshold And to class we assign observation to class if otherwise. The Person correlation between true class membership and the predicted class membership.

The best model with respect to the log-likelihood score on the validation set is the group lasso estimator. The corresponding values of and on the test set are , respectively. Whereas the group lasso solution has some active three-way interactions, the group lasso-ridge hybrid and the group lasso. MLE hybrid contain only two-way interations. The three-way interactions of the group lasso solution seem to be very weak.

Conclusions • Study the group lasso for logistic regression • Present efficient algorithm (automatic and much faster) • Propose the group lasso-ridge hybrid method • Apply to short DNA motif modelling and splice site detection