This article studies exponential families $\mathcal{E}$ on finite sets such
that the information divergence $D(P\|\mathcal{E})$ of an arbitrary probability
distribution from $\mathcal{E}$ is bounded by some constant $D>0$. A particular
class of low-dimensional exponential families that have low values of $D$ can
be obtained from partitions of the state space. The main results concern
optimality properties of these partition exponential families. Exponential
families where $D=\log(2)$ are studied in detail.
We study notions of robustness of Markov kernels and probability distribution
of a system that is described by $n$ input random variables and one output
random variable. Markov kernels can be expanded in a series of potentials that
allow to describe the system's behaviour after knockouts. Robustness imposes
structural constraints on these potentials. Robustness of probability
distributions is defined via conditional independence statements. These
statements can be studied algebraically. The corresponding conditional
independence ideals are related to binary edge ideals.
The closure of a discrete exponential family is described by a finite set of
equations corresponding to the circuits of an underlying oriented matroid.
These equations are similar to the equations used in algebraic statistics,
although they need not be polynomial in the general case. This description
allows for a combinatorial study of the possible support sets in the closure of
an exponential family. If two exponential families induce the same oriented
matroid, then their closures have the same support sets.
This paper investigates maximizers of the information divergence from an
exponential family $E$. It is shown that the $rI$-projection of a maximizer $P$
to $E$ is a convex combination of $P$ and a probability measure $P_-$ with
disjoint support and the same value of the sufficient statistics $A$. This
observation can be used to transform the original problem of maximizing
$D(\cdot||E)$ over the set of all probability measures into the maximization of
a function $\Dbar$ over a convex subset of $\ker A$. The global maximizers of
both problems correspond to each other.