From observational data alone, a causal DAG is in general only identifiable
up to Markov equivalence. Interventional data generally improves
identifiability; however, the gain of an intervention strongly depends on the
intervention target, i.e., the intervened variables. We present active learning
strategies calculating optimal interventions for two different learning goals.
The first one is a greedy approach using single-vertex interventions that
maximizes the number of edges that can be oriented after each intervention.
We consider structural equation models (SEMs) in which variables can be
written as a function of their parents and noise terms (the latter are assumed
to be jointly independent). Corresponding to each SEM, there is a directed
acyclic graph (DAG) G_0 describing the relationships between the variables. In
Gaussian SEMs with linear functions, the graph can be identified from the joint
distribution only up to Markov equivalence classes (assuming faithfulness). It
has been shown, however, that this constitutes an exceptional case.
Test statistics are often strongly dependent in large-scale multiple testing
applications. Most corrections for multiplicity are unduly conservative for
correlated test statistics, resulting in a loss of power to detect true
positives. We show that the Westfall--Young permutation method has
asymptotically optimal power for a broad class of testing problems with a
block-dependence and sparsity structure among the tests, when the number of
tests tends to infinity.
The current Special Issue of The Annals of Statistics contains three invited
articles. Javier Rojo discusses Erich's scientific achievements and provides
complete lists of his scientific writings and his former Ph.D. students.
The ultimate goal of regression analysis is to obtain information about the
conditional distribution of a response given a set of explanatory variables.
This goal is, however, seldom achieved because most established regression
models only estimate the conditional mean as a function of the explanatory
variables and assume that higher moments are not affected by the regressors.
The underlying reason for such a restriction is the assumption of additivity of
signal and noise. We propose to relax this common assumption in the framework
of transformation models.
We propose an L1-penalized algorithm for fitting high-dimensional generalized
linear mixed models. Generalized linear mixed models (GLMMs) can be viewed as
an extension of generalized linear models for clustered observations. This
Lasso-type approach for GLMMs should be mainly used as variable screening
method to reduce the number of variables below the sample size. We then suggest
a refitting by maximum likelihood based on the selected variables only. This is
an effective correction to overcome problems stemming from the variable
screening procedure which are more severe with GLMMs.
A conditional independence graph is a concise representation of pairwise
conditional independence among many variables. We propose Graphical Random
Forests (GRaFo) for estimating pairwise conditional independence relationships
among mixed-type, i.e. continuous and discrete, variables. The number of edges
is a tuning parameter in any graphical model estimator and there is no obvious
number that constitutes a good choice. Stability Selection helps choosing this
parameter with respect to a bound on the expected number of false positives
(error control).
The investigation of directed acyclic graphs (DAGs) encoding the same Markov
property, that is the same conditional independence relations of multivariate
observational distributions, has a long tradition; many algorithms exist for
model selection and structure learning in Markov equivalence classes. In this
paper, we extend the notion of Markov equivalence of DAGs to the case of
interventional distributions arising from multiple intervention experiments.
In 1994, I came to Berkeley and was fortunate to stay there three years,
first as a postdoctoral researcher and then as Neyman Visiting Assistant
Professor. For me, this period was a unique opportunity to see other aspects
and learn many more things about statistics: the Department of Statistics at
Berkeley was much bigger and hence broader than my home at ETH Z\"urich and I
enjoyed very much that the science was perhaps a bit more speculative.
We propose an l1-regularized likelihood method for estimating the inverse
covariance matrix in the high-dimensional multivariate normal model in presence
of missing data. Our method is based on the assumption that the data are
missing at random (MAR) which entails also the completely missing at random
case. The implementation of the method is non-trivial as the observed negative
log-likelihood generally is a complicated and non-convex function. We propose
an efficient EM-algorithm for optimization with provable numerical convergence
properties.
We propose an $\ell_1$-penalized estimation procedure for high-dimensional
linear mixed-effects models. The models are useful whenever there is a grouping
structure among high-dimensional observations, i.e. for clustered data. We
prove a consistency and an oracle optimality result and we develop an algorithm
with provable numerical convergence. Furthermore, we demonstrate the
performance of the method on simulated and a real high-dimensional dataset.
We propose a new sparsity-smoothness penalty for high-dimensional generalized
additive models. The combination of sparsity and smoothness is crucial for
mathematical theory as well as performance for finite-sample data. We present a
computationally efficient algorithm, with provable numerical convergence
properties, for optimizing the penalized likelihood. Furthermore, we provide
oracle results which yield asymptotic optimality of our estimator for high
dimensional but sparse additive models.
Large contingency tables summarizing categorical variables arise in many
areas. For example in biology when a large number of biomarkers are
cross-tabulated according to their discrete expression level. Interactions of
the variables are generally studied with log-linear models and the structure of
a log-linear model can be visually represented by a graph from which the
conditional independence structure can then be read off.
We present a graph-based technique for estimating sparse covariance matrices
and their inverse from high-dimensional data. The method is based on learning a
directed acyclic graph (DAG) and estimating parameters of a multivariate
Gaussian distribution based on a DAG. For inferring the underlying DAG we use
the PC-algorithm and for estimating the DAG-based covariance matrix and its
inverse, we use a Cholesky decomposition approach which provides a positive
(semi-)definite sparse estimate.
We consider variable selection in high-dimensional linear models where the
number of covariates greatly exceeds the sample size. We introduce the new
concept of partial faithfulness and use it to infer associations between the
covariates and the response.
We consider variable selection in high-dimensional linear models where the
number of covariates greatly exceeds the sample size. We introduce the new
concept of partial faithfulness and use it to infer associations between the
covariates and the response.
Oracle inequalities and variable selection properties for the Lasso in linear
models have been established under a variety of different assumptions on the
design matrix. We show in this paper how the different conditions and concepts
relate to each other. The restricted eigenvalue condition (Bickel et al., 2009)
or the slightly weaker compatibility condition (van de Geer, 2007) are
sufficient for oracle results. We argue that both these conditions allow for a
fairly general class of design matrices.
We assume that we have observational data generated from an unknown
underlying directed acyclic graph (DAG) model. A DAG is typically not
identifiable from observational data, but it is possible to consistently
estimate the equivalence class of a DAG. Moreover, for any given DAG, causal
effects can be estimated using intervention calculus. In this paper, we combine
these two parts. For each DAG in the estimated equivalence class, we use
intervention calculus to estimate the causal effects of the covariates on the
response.