Common methods of causal inference generate directed acyclic graphs (DAGs)
that formalize causal relations between n variables. Given the joint
distribution of all these variables, the DAG contains all information about how
intervening on one variable would change the distribution of the other n-1
variables. It remains, however, a non-trivial question how to quantify the
causal influence of one variable on another one.
We information-theoretically reformulate two measures of capacity from
statistical learning theory: empirical VC-entropy and empirical Rademacher
complexity. We show these capacity measures count the number of hypotheses
about a dataset that a learning algorithm falsifies when it finds the
classifier in its repertoire minimizing empirical risk. It then follows from
that the future performance of predictors on unseen data is controlled in part
by how many hypotheses the learner falsifies.
Broadly speaking, there are two approaches to quantifying information. The
first, Shannon information, takes events as belonging to ensembles and
quantifies the information resulting from observing the given event in terms of
the number of alternate events that have been ruled out. The second,
algorithmic information or Kolmogorov complexity, takes events as strings and,
given a universal Turing machine, quantifies the information content of a
string as the length of the shortest program producing it.