Minimax risks for sparse regressions: Ultra-high-dimensional phenomenons

Statistics and Modeling for Complex Data

Consider the standard  Gaussian linear regression model Y=X θ+ ε, where Y ∈ Rn is a response vector and XRn x p is a design matrix.
Numerous work have been devoted to building efficient estimators of θ when p is much  larger than n. In such a situation, a classical approach amounts to assuming that θ is approximately sparse. In this talk, I study the minimax risks of estimation and testing over classes of k-sparse vectors θ. These bounds shed light on the limitations due to high-dimensionality.

The results encompass the problem of prediction (estimation of Xθ), the inverse problem (estimation of θ) and linear testing (testing θ=0). Interestingly, an elbow effect occurs when the number of variables k log(p) becomes large compared to n. Indeed, the minimax risks and hypothesis separation distances blow up in this ultra-high dimensional setting. In fact, even dimension reduction techniques cannot provide satisfying results in such an ultra-high dimensional setting.