To successfully work on variable selection, sparse model structure has become
a basic assumption for all existing methods. However, this assumption is
questionable as it is hard to hold in most of cases and none of existing
methods may provide consistent estimation and accurate model prediction in
nons-parse scenarios.
Similar to variable selection in the linear regression model, selecting
significant components in the popular additive regression model is of great
interest. However, such components are unknown smooth functions of independent
variables, which are unobservable. As such, some approximation is needed. In
this paper, we suggest a combination of penalized regression spline
approximation and group variable selection, called the lasso-type spline method
(LSM), to handle this component selection problem with a diverging number of
strongly correlated variables in each group.
The variable selection problem for high-dimensional models has become an
important topic in modern statistics, especially for the setting which the
number of predictors $p$ is much larger than the number of observations $n$. In
this paper, we propose a rank correlation screening (RCS), a novel method, to
deal with the ultra-high dimensional data. We show that our proposed procedure
possesses a sure independence screening property even when the number of
predictor variables grows as exponential dimensionality.
For consistency (even oracle properties) of estimation and model prediction,
almost all existing methods of variable/feature selection critically depend on
sparsity of models. However, for ``large $p$ and small $n$" models sparsity
assumption is hard to check and particularly, when this assumption is violated,
the consistency of all existing estimations is usually impossible because
working models selected by existing methods such as the LASSO and the Dantzig
selector are usually biased. To attack this problem, we in this paper propose
adaptive post-Dantzig estimation and model prediction.
Generalized linear models are widely used in regression analyses.
Asymptotically, the quasi-Fisher information has been proved to be the lowest
bound for any linear estimations that are based on the quasi-likelihood. Thus,
it has long been a benchmark to be compared for the asymptotic efficiency of
any new estimation.
In this paper, we propose a covariate-adjusted nonlinear regression model. In
this model, both the response and predictors can only be observed after being
distorted by some multiplicative factors. Because of nonlinearity, existing
methods for the linear setting cannot be directly employed. To attack this
problem, we propose estimating the distorting functions by nonparametrically
regressing the predictors and response on the distorting covariate; then,
nonlinear least squares estimators for the parameters are obtained using the
estimated response and predictors.