Cosma Rohilla Shalizi

  1. Estimated VC dimension for risk bounds.

    Authors: Cosma Rohilla Shalizi, Daniel J. McDonald, Mark Schervish
    Subjects: Machine Learning
    Abstract

    Vapnik-Chervonenkis (VC) dimension is a fundamental measure of the
    generalization capacity of learning algorithms. However, apart from a few
    special cases, it is hard or impossible to calculate analytically. Vapnik et
    al. [10] proposed a technique for estimating the VC dimension empirically.
    While their approach behaves well in simulations, it could not be used to bound
    the generalization risk of classifiers, because there were no bounds for the
    estimation error of the VC dimension itself.

  2. Consistency under Sampling of Exponential Random Graph Models.

    Authors: Alessandro Rinaldo, Cosma Rohilla Shalizi
    Subjects: Statistics
    Abstract

    The growing availability of network data and of scientific interest in
    distributed systems has led to the rapid development of statistical models of
    network structure. Typically, however, these are models for the entire network,
    while the data consists only of a sampled sub-network. Parameters for the whole
    network, which is what is of interest, are estimated by applying the model to
    the sub-network. This assumes that the model is consistent under sampling, or,
    in terms of the theory of stochastic processes, that it defines a projective
    family.

  3. Adapting to Non-stationarity with Growing Expert Ensembles.

    Authors: Cosma Rohilla Shalizi, Abigail Z. Jacobs, Aaron Clauset
    Subjects: Machine Learning
    Abstract

    When dealing with time series with complex and uncertain non-stationarities,
    low retrospective regret on individual realizations is in general a more
    appropriate goal than low prospective risk in expectation.

  4. Generalization error bounds for stationary autoregressive models.

    Authors: Cosma Rohilla Shalizi, Daniel J. McDonald, Mark Schervish
    Subjects: Machine Learning
    Abstract

    We derive generalization error bounds for stationary univariate
    autoregressive (AR) models. We show that the stationarity assumption alone lets
    us treat the estimation of AR models as a regularized kernel regression without
    the need to further regularize the model arbitrarily. We thereby bound the
    Rademacher complexity of AR models and apply existing Rademacher complexity
    results to characterize the predictive risk of AR models. We demonstrate our
    methods by predicting interest rate movements.

  5. Estimating $\beta$-mixing coefficients.

    Authors: Cosma Rohilla Shalizi, Daniel J. McDonald, Mark Schervish
    Subjects: Machine Learning
    Abstract

    The literature on statistical learning for time series assumes the asymptotic
    independence or ``mixing' of the data-generating process. These mixing
    assumptions are never tested, nor are there methods for estimating mixing rates
    from data. We give an estimator for the $\beta$-mixing rate based on a single
    stationary sample path and show it is $L_1$-risk consistent.

  6. Scaling and Hierarchy in Urban Economies.

    Authors: Cosma Rohilla Shalizi
    Subjects: Applications
    Abstract

    In several recent publications, Bettencourt, West and collaborators claim
    that properties of cities such as gross economic production, personal income,
    numbers of patents filed, number of crimes committed, etc., show super-linear
    power-scaling with total population, while measures of resource use show
    sub-linear power-law scaling.

  7. Philosophy and the practice of Bayesian statistics.

    Authors: Cosma Rohilla Shalizi, Andrew Gelman
    Subjects: Statistics
    Abstract

    A substantial school in the philosophy of science identifies Bayesian
    inference with inductive inference and even rationality as such, and seems to
    be strengthened by the rise and practical success of Bayesian statistics. We
    argue that the most successful forms of Bayesian statistics do not actually
    support that particular philosophy but rather accord much better with
    sophisticated forms of hypothetico-deductivism.

  8. Homophily and Contagion Are Generically Confounded in Observational Social Network Studies.

    Authors: Cosma Rohilla Shalizi, Andrew C. Thomas
    Subjects: Applications
    Abstract

    We consider processes on social networks that can potentially involve three
    phenomena: homophily, or the formation of social ties due to matching
    individual traits; social contagion, also known as social influence; and the
    causal effect of an individual's covariates on their behavior or other
    measurable responses. We show that, generically, all of these are confounded
    with each other. Distinguishing them from one another requires strong
    assumptions on the parametrization of the social process or on the adequacy of
    the covariates used (or both).

  9. Approximate Methods for State-Space Models.

    Authors: Cosma Rohilla Shalizi, Shinsuke Koyama, Lucia Castellanos Pérez-Bolde, Robert E. Kass
    Subjects: Methodology
    Abstract

    State-space models provide an important body of techniques for analyzing
    time-series, but their use requires estimating unobserved states. The optimal
    estimate of the state is its conditional expectation given the observation
    histories, and computing this expectation is hard when there are
    nonlinearities. Existing filtering methods, including sequential Monte Carlo,
    tend to be either inaccurate or slow.

  10. Dynamics of Bayesian Updating with Dependent Data and Misspecified Models.

    Authors: Cosma Rohilla Shalizi
    Subjects: Statistics
    Abstract

    Recent work on Bayesian updating in infinite-dimensional parameter spaces has
    established conditions under which the posterior distribution will concentrate
    on the truth, if the latter has a perfect representation within the support of
    the prior, subject to dynamical restrictions such as independent or Markovian
    data.

RSS-материал