Genetic investigations often involve the testing of vast numbers of related
hypotheses simultaneously. To control the overall error rate, a substantial
penalty is required, making it difficult to detect signals of moderate
strength. To improve the power in this setting, a number of authors have
considered using weighted $p$-values, with the motivation often based upon the
scientific plausibility of the hypotheses. We review this literature, derive
optimal weights and show that the power is remarkably robust to
misspecification of these weights.
A challenging problem in estimating high-dimensional graphical models is to
choose the regularization parameter in a data-dependent way. The standard
techniques include $K$-fold cross-validation ($K$-CV), Akaike information
criterion (AIC), and Bayesian information criterion (BIC). Though these methods
work well for low-dimensional problems, they are not suitable in high
dimensional settings. In this paper, we present StARS: a new stability-based
method for choosing the regularization parameter in high dimensional inference
for undirected graphs.
We introduce a new version of forward stepwise regression. Our modification
finds solutions to regression problems where the selected predictors appear in
a structured pattern, with respect to a predefined distance measure over the
candidate predictors. Our method is motivated by the problem of predicting
HIV-1 drug resistance from protein sequences. We find that our methods improve
the interpretability of drug resistance while producing comparable predictive
accuracy to standard methods.
This paper explores the following question: what kind of statistical
guarantees can be given when doing variable selection in high-dimensional
models? In particular, we look at the error rates and power of some multi-stage
regression methods. In the first stage we fit a set of candidate models. In the
second stage we select one model by cross-validation. In the third stage we use
hypothesis testing to eliminate some variables.