The local geometry of mixtures

Statistics and Modeling for Complex Data
GASSIAT Élisabeth

Mixture modelization of distributions uses a set $\cal F$ of probability densities, and for any positive integer $q$, the set $(\mathcal{M}_q)$ of convex combinations (mixtures) of elements of $\cal F$. To derive statistical results for estimation procedures, one needs a quantitative description of complexity for the sequence of models $(\mathcal{M}_q)_{q\in\mathbb{N}}$. Mixture models possess a notoriously complicated geometric structure. In this talk, we will give results about the local entropy of the square root of likelihood ratios (scores) in Hellinger distance. We will explain how they may be derived from the global entropy for the set of normalized scores.

As an application, we will give the precise rate of penalties that lead to almost sure identification of model order without prior upper bound : the minimal penalty that yields strong consistency in the absence of an a priori upper bound on the model order is of order $\eta(q)\log\log n$, where $\eta(q)$ is a dimensional quantity. The proof is based on a (general) precise characterization of the pathwise fluctuations of the generalized likelihood ratio statistics in terms of the geometry of the underlying model.

This talk is based on joint work with Ramon van Handel, Princeton University.