Vapnik-Chervonenkis (VC) dimension is a fundamental measure of the
generalization capacity of learning algorithms. However, apart from a few
special cases, it is hard or impossible to calculate analytically. Vapnik et
al. [10] proposed a technique for estimating the VC dimension empirically.
While their approach behaves well in simulations, it could not be used to bound
the generalization risk of classifiers, because there were no bounds for the
estimation error of the VC dimension itself.
The growing availability of network data and of scientific interest in
distributed systems has led to the rapid development of statistical models of
network structure. Typically, however, these are models for the entire network,
while the data consists only of a sampled sub-network. Parameters for the whole
network, which is what is of interest, are estimated by applying the model to
the sub-network. This assumes that the model is consistent under sampling, or,
in terms of the theory of stochastic processes, that it defines a projective
family.
When dealing with time series with complex and uncertain non-stationarities,
low retrospective regret on individual realizations is in general a more
appropriate goal than low prospective risk in expectation.
We derive generalization error bounds for stationary univariate
autoregressive (AR) models. We show that the stationarity assumption alone lets
us treat the estimation of AR models as a regularized kernel regression without
the need to further regularize the model arbitrarily. We thereby bound the
Rademacher complexity of AR models and apply existing Rademacher complexity
results to characterize the predictive risk of AR models. We demonstrate our
methods by predicting interest rate movements.
The literature on statistical learning for time series assumes the asymptotic
independence or ``mixing' of the data-generating process. These mixing
assumptions are never tested, nor are there methods for estimating mixing rates
from data. We give an estimator for the $\beta$-mixing rate based on a single
stationary sample path and show it is $L_1$-risk consistent.
In several recent publications, Bettencourt, West and collaborators claim
that properties of cities such as gross economic production, personal income,
numbers of patents filed, number of crimes committed, etc., show super-linear
power-scaling with total population, while measures of resource use show
sub-linear power-law scaling.
A substantial school in the philosophy of science identifies Bayesian
inference with inductive inference and even rationality as such, and seems to
be strengthened by the rise and practical success of Bayesian statistics. We
argue that the most successful forms of Bayesian statistics do not actually
support that particular philosophy but rather accord much better with
sophisticated forms of hypothetico-deductivism.
We consider processes on social networks that can potentially involve three
phenomena: homophily, or the formation of social ties due to matching
individual traits; social contagion, also known as social influence; and the
causal effect of an individual's covariates on their behavior or other
measurable responses. We show that, generically, all of these are confounded
with each other. Distinguishing them from one another requires strong
assumptions on the parametrization of the social process or on the adequacy of
the covariates used (or both).
State-space models provide an important body of techniques for analyzing
time-series, but their use requires estimating unobserved states. The optimal
estimate of the state is its conditional expectation given the observation
histories, and computing this expectation is hard when there are
nonlinearities. Existing filtering methods, including sequential Monte Carlo,
tend to be either inaccurate or slow.
Recent work on Bayesian updating in infinite-dimensional parameter spaces has
established conditions under which the posterior distribution will concentrate
on the truth, if the latter has a perfect representation within the support of
the prior, subject to dynamical restrictions such as independent or Markovian
data.