Case-cohort design, an outcome-dependent sampling design for censored
survival data, is increasingly used in biomedical research. The development of
asymptotic theory for a case-cohort design in the current literature primarily
relies on counting process stochastic integrals. Such an approach, however, is
rather limited and lacks theoretical justification for outcome-dependent
weighted methods due to non-predictability.
We consider the finite sample properties of the regularized high-dimensional
Cox regression via lasso. Existing literature focuses on linear models or
generalized linear models with Lipschitz loss functions, where the empirical
risk functions are the summations of independent and identically distributed
(iid) losses. The summands in the negative log partial likelihood function for
censored survival data, however, are neither iid nor Lipschitz.
We propose a computationally intensive method, the random lasso method, for
variable selection in linear models. The method consists of two major steps. In
step 1, the lasso method is applied to many bootstrap samples, each using a set
of randomly selected covariates. A measure of importance is yielded from this
step for each covariate. In step 2, a similar procedure to the first step is
implemented with the exception that for each bootstrap sample, a subset of
covariates is randomly selected with unequal selection probabilities determined
by the covariates' importance.
We consider a class of doubly weighted rank-based estimating methods for the
transformation (or accelerated failure time) model with missing data as arise,
for example, in case-cohort studies. The weights considered may not be
predictable as required in a martingale stochastic process formulation. We
treat the general problem as a semiparametric estimating equation problem and
provide proofs of asymptotic properties for the weighted estimators, with
either true weights or estimated weights, by using empirical process theory
where martingale theory may fail.