Sparsity with sign-coherent groups of variables via the cooperative-Lasso

Statistics and Modeling for Complex Data

We consider the problems of estimation and selection of parameters endowed with a known group structure, when the groups are assumed to be sign-coherent, that is, gathering either non-negative, non-positive or null parameters. To tackle this problem we propose a new penalty that we call the cooperative-Lasso penalty. We derive the optimality conditions defining the cooperative-Lasso estimate for generalized linear models and propose an efficient active set algorithm suited to high-dimensional problems. We study the asymptotic consistency of the estimator in the linear regression setup and derive its irrepresentable conditions, which are milder than the ones of the group-Lasso regarding the matching of groups with the sparsity pattern of the true parameters. We also address the problem of model selection in linear regression by deriving an approximation of the degrees of freedom of the cooperative-Lasso estimator. Simulations comparing the proposed estimator to the group-Lasso comply with our theoretical results, showing consistent improvements in support recovery for sign-coherent groups. We finally propose an approach widely applicable to the processing of genomic data, where the set of differentially expressed probes is enriched by incorporating all the probes of the microarray that are related to the corresponding genes. In an application to the estimation of chemotherapy pathologic response in breast cancer, the cooperative-Lasso demonstrates much better performances than its competitors.