Consider the standard Gaussian linear regression model Y=X θ+ ε, where Y ∈ Rn is a response vector and X∈Rn x p is a design matrix.
Numerous work have been devoted to building efficient estimators of θ when p is much larger than n. In such a situation, a classical approach amounts to assuming that θ is approximately sparse. In this talk, I study the minimax risks of estimation and testing over classes of k-sparse vectors θ. These bounds shed light on the limitations due to high-dimensionality.
The results encompass the problem of prediction (estimation of Xθ), the inverse problem (estimation of θ) and linear testing (testing θ=0). Interestingly, an elbow effect occurs when the number of variables k log(p) becomes large compared to n. Indeed, the minimax risks and hypothesis separation distances blow up in this ultra-high dimensional setting. In fact, even dimension reduction techniques cannot provide satisfying results in such an ultra-high dimensional setting.