Jianqing Fan

  1. Adaptive Robust Variable Selection.

    Authors: Jianqing Fan, Yingying Fan, Emre Barut
    Subjects: Statistics
    Abstract

    Heavy-tailed high-dimensional data are commonly encountered in various
    scientific fields and pose great challenges to modern statistical analysis. A
    natural procedure to address this problem is to use penalized least absolute
    deviation (LAD) method with weighted $L_1$-penalty, called weighted robust
    Lasso (WR-Lasso), in which weights are introduced to ameliorate the bias
    problem induced by the $L_1$-penalty.

  2. Endogeneity in Ultrahigh Dimension.

    Authors: Jianqing Fan, Yuan Liao
    Subjects: Statistics
    Abstract

    Most papers on high-dimensional statistics are based on the assumption that
    none of the regressors are correlated with the regression error, namely, they
    are exogeneous. Yet, endogeneity arises easily in high-dimensional regression
    due to a large pool of regressors and this causes the inconsistency of the
    penalized least-squares methods and possible false scientific discoveries. A
    necessary condition for model selection of a very general class of penalized
    regression methods is given, which allows us to prove formally the
    inconsistency claim.

  3. High Dimensional Covariance Matrix Estimation in Approximate Factor Models.

    Authors: Jianqing Fan, Yuan Liao, Martina Mincheva
    Subjects: Methodology
    Abstract

    The variance covariance matrix plays a central role in the inferential
    theories of high dimensional factor models in finance and economics. Popular
    regularization methods of directly exploiting sparsity are not directly
    applicable to many financial problems. Classical methods of estimating the
    covariance matrices are based on the strict factor models, assuming independent
    idiosyncratic components. This assumption, however, is restrictive in practical
    applications.

  4. Multiple testing via $FDR_L$ for large-scale imaging data.

    Authors: Jianqing Fan, Chunming Zhang, Tao Yu
    Subjects: Statistics
    Abstract

    The multiple testing procedure plays an important role in detecting the
    presence of spatial signals for large-scale imaging data. Typically, the
    spatial signals are sparse but clustered.

  5. Variance Estimation Using Refitted Cross-validation in Ultrahigh Dimensional Regression.

    Authors: Jianqing Fan, Shaojun Guo, Ning Hao
    Subjects: Methodology
    Abstract

    Variance estimation is a fundamental problem in statistical modeling. In
    ultrahigh dimensional linear regressions where the dimensionality is much
    larger than sample size, traditional variance estimation techniques are not
    applicable. Recent advances on variable selection in ultrahigh dimensional
    linear regressions make this problem accessible. One of the major problems in
    ultrahigh dimensional regression is the high spurious correlation between the
    unobserved realized noise and some of the predictors.

  6. Control of the False Discovery Rate Under Arbitrary Covariance Dependence.

    Authors: Jianqing Fan, Xu Han, Weijie Gu
    Subjects: Methodology
    Abstract

    Multiple hypothesis testing is a fundamental problem in high dimensional
    inference, with wide applications in many scientific fields. In genome-wide
    association studies, tens of thousands of tests are performed simultaneously to
    find if any genes are associated with some traits and those tests are
    correlated. When test statistics are correlated, false discovery control
    becomes very challenging under arbitrary dependence.

  7. A ROAD to Classification in High Dimensional Space.

    Authors: Jianqing Fan, Yang Feng, Xin Tong
    Subjects: Machine Learning
    Abstract

    For high-dimensional classification, it is well known that naively performing
    the Fisher discriminant rule leads to poor results due to diverging spectra and
    noise accumulation. Therefore, researchers proposed independence rules to
    circumvent the diverse spectra, and sparse independence rules to mitigate the
    issue of noise accumulation. However, in biological applications, there are
    often a group of correlated genes responsible for clinical outcomes, and the
    use of the covariance information can significantly reduce misclassification
    rates.

  8. Nonparametric tests of the Markov hypothesis in continuous-time models.

    Authors: Yacine Aït-Sahalia, Jianqing Fan, Jiancheng Jiang
    Subjects: Statistics
    Abstract

    We propose several statistics to test the Markov hypothesis for
    $\beta$-mixing stationary processes sampled at discrete time intervals. Our
    tests are based on the Chapman--Kolmogorov equation. We establish the
    asymptotic null distributions of the proposed test statistics, showing that
    Wilks's phenomenon holds. We compute the power of the test and provide
    simulations to investigate the finite sample performance of the test statistics
    when the null model is a diffusion process, with alternatives consisting of
    models with a stochastic mean reversion level, stochastic volatility and jumps.

  9. Nonparametric estimation of genewise variance for microarray data.

    Authors: Jianqing Fan, Yang Feng, Yue S. Niu
    Subjects: Statistics
    Abstract

    Estimation of genewise variance arises from two important applications in
    microarray data analysis: selecting significantly differentially expressed
    genes and validation tests for normalization of microarray data. We approach
    the problem by introducing a two-way nonparametric model, which is an extension
    of the famous Neyman--Scott model and is applicable beyond microarray data.

  10. Control of the False Discovery Rate Under Arbitrary Covariance Dependence.

    Authors: Jianqing Fan, Xu Han, Weijie Gu
    Subjects: Methodology
    Abstract

    Multiple hypothesis testing is a fundamental problem in high dimensional
    inference, with wide applications in many scientific fields. In genome-wide
    association studies, tens of thousands of tests are performed simultaneously to
    find if any genes are associated with some traits and those tests are
    correlated. When test statistics are correlated, false discovery control
    becomes very challenging under arbitrary dependence.

  11. Regularization for Cox's Proportional Hazards Model With NP-Dimensionality.

    Authors: Jianqing Fan, Jelena Bradic, Jiancheng Jiang
    Subjects: Statistics
    Abstract

    High throughput genetic sequencing arrays with thousands of measurements per
    sample and a great amount of related censored clinical data have increased
    demanding need for better measurement specific model selection. In this paper
    we establish strong oracle properties of non-concave penalized methods for {\it
    non-polynomial} (NP) dimensional data with censoring in the framework of Cox's
    proportional hazards model. A class of folded-concave penalties are employed
    and both LASSO and SCAD are discussed specifically.

  12. Estimation in additive models with highly or nonhighly correlated covariates.

    Authors: Jianqing Fan, Yingying Fan, Jiancheng Jiang
    Subjects: Statistics
    Abstract

    Motivated by normalizing DNA microarray data and by predicting the interest
    rates, we explore nonparametric estimation of additive models with highly
    correlated covariates. We introduce two novel approaches for estimating the
    additive components, integration estimation and pooled backfitting estimation.
    The former is designed for highly correlated covariates, and the latter is
    useful for nonhighly correlated covariates. Asymptotic normalities of the
    proposed estimators are established.

  13. Vast Volatility Matrix Estimation using High Frequency Data for Portfolio Selection.

    Authors: Jianqing Fan, Yingying Li, Ke Yu
    Subjects: Applications
    Abstract

    Portfolio allocation with gross-exposure constraint is an effective method to
    increase the efficiency and stability of selected portfolios among a vast pool
    of assets, as demonstrated in Fan et al (2008). The required high-dimensional
    volatility matrix can be estimated by using high frequency financial data. This
    enables us to better adapt to the local volatilities and local correlations
    among vast number of assets and to increase significantly the sample size for
    estimating the volatility matrix.

  14. Ultrahigh dimensional variable selection for Cox's proportional hazards model.

    Authors: Jianqing Fan, Yichao Wu, Yang Feng
    Subjects: Machine Learning
    Abstract

    Variable selection in high dimensional space has challenged many contemporary
    statistical problems from many frontiers of scientific disciplines. Recent
    technology advance has made it possible to collect a huge amount of covariate
    information such as microarray, proteomic and SNP data via bioimaging
    technology while observing survival information on patients in clinical
    studies. Thus, the same challenge applies to the survival analysis in order to
    understand the association between genomics information and clinical
    information about the survival time.

  15. Non-Gaussian Quasi Maximum Likelihood Estimation of GARCH Models.

    Authors: Jianqing Fan, Lei Qi, Dacheng Xiu
    Subjects: Methodology
    Abstract

    The non-Gaussian quasi maximum likelihood estimator is frequently used in
    GARCH models with intension to improve the efficiency of the GARCH parameters.
    However, the method is usually inconsistent unless the quasi-likelihood happens
    to be the true one. We identify an unknown scale parameter that is critical to
    the consistent estimation of non-Gaussian QMLE. As a part of estimating this
    unknown parameter, a two-step non-Gaussian QMLE (2SNG-QMLE) is proposed for
    estimation the GARCH parameters.

  16. Penalized Composite Quasi-Likelihood for Ultrahigh-Dimensional Variable Selection.

    Authors: Jianqing Fan, Jelena Bradic, Weiwei Wang
    Subjects: Methodology
    Abstract

    In high-dimensional model selection problems, penalized simple least-square
    approaches have been extensively used. This paper addresses the question of
    both robustness and efficiency of penalized model selection methods, and
    proposes a data-driven weighted linear combination of convex loss functions,
    together with weighted $L_1$-penalty. It is completely data-adaptive and does
    not require prior knowledge of the error distribution. The weighted
    $L_1$-penalty is used both to ensure the convexity of the penalty term and to
    ameliorate the bias caused by the $L_1$-penalty.

  17. Nonparametric Independence Screening in Sparse Ultra-High Dimensional Additive Models.

    Authors: Rui Song, Jianqing Fan, Yang Feng
    Subjects: Methodology
    Abstract

    A variable screening procedure via correlation learning was proposed Fan and
    Lv (2008) to reduce dimensionality in sparse ultra-high dimensional models.
    Even when the true model is linear, the marginal regression can be highly
    nonlinear. To address this issue, we further extend the correlation learning to
    marginal nonparametric learning. Our nonparametric independence screening is
    called NIS, a specific member of the sure independence screening. Several
    closely related variable screening procedures are proposed.

  18. Sparsistency and rates of convergence in large covariance matrix estimation.

    Authors: Jianqing Fan, Clifford Lam
    Subjects: Statistics
    Abstract

    This paper studies the sparsistency and rates of convergence for estimating
    sparse covariance and precision matrices based on penalized likelihood with
    nonconvex penalty functions. Here, sparsistency refers to the property that all
    parameters that are zero are actually estimated as zero with probability
    tending to one. Depending on the case of applications, sparsity priori may
    occur on the covariance matrix, its inverse or its Cholesky decomposition. We
    study these three sparsity exploration problems under a unified framework with
    a general penalty function.

  19. Local quasi-likelihood with a parametric guide.

    Authors: Jianqing Fan, Yichao Wu, Yang Feng
    Subjects: Statistics
    Abstract

    Generalized linear models and the quasi-likelihood method extend the ordinary
    regression models to accommodate more general conditional distributions of the
    response. Nonparametric methods need no explicit parametric specification, and
    the resulting model is completely determined by the data themselves. However,
    nonparametric estimation schemes generally have a slower convergence rate such
    as the local polynomial smoothing estimation of nonparametric generalized
    linear models studied in Fan, Heckman and Wand [J. Amer. Statist. Assoc. 90
    (1995) 141--150].

  20. Sure Independence Screening in Generalized Linear Models with NP-Dimensionality.

    Authors: Rui Song, Jianqing Fan
    Subjects: Methodology
    Abstract

    Ultrahigh dimensional variable selection plays an increasingly important role
    in contemporary scientific discoveries and statistical research. Among others,
    Fan and Lv (2008) propose an independent screening framework by ranking the
    marginal correlations. They showed that the correlation ranking procedure
    possesses a sure independence screening property within the context of the
    linear model with Gaussian covariates and responses.

  21. Sure Independence Screening in Generalized Linear Models with NP-Dimensionality.

    Authors: Rui Song, Jianqing Fan
    Subjects: Methodology
    Abstract

    Ultrahigh dimensional variable selection plays an increasingly important role
    in contemporary scientific discoveries and statistical research. Among others,
    Fan and Lv (2008) propose an independent screening framework by ranking the
    marginal correlations. They showed that the correlation ranking procedure
    possesses a sure independence screening property within the context of the
    linear model with Gaussian covariates and responses.

  22. A Selective Overview of Variable Selection in High Dimensional Feature Space (Invited Review Article).

    Authors: Jianqing Fan, Jinchi Lv
    Subjects: Statistics
    Abstract

    High dimensional statistical problems arise from diverse fields of scientific
    research and technological development. Variable selection plays a pivotal role
    in contemporary statistical learning and scientific discoveries. The
    traditional idea of best subset selection methods, which can be regarded as a
    specific form of penalized likelihood, is computationally too expensive for
    many modern statistical applications. Other forms of penalized likelihood
    methods have been successfully developed over the last decade to cope with high
    dimensionality.

  23. A Selective Overview of Variable Selection in High Dimensional Feature Space (Invited Review Article).

    Authors: Jianqing Fan, Jinchi Lv
    Subjects: Statistics
    Abstract

    High dimensional statistical problems arise from diverse fields of scientific
    research and technological development. Variable selection plays a pivotal role
    in contemporary statistical learning and scientific discoveries. The
    traditional idea of best subset selection methods, which can be regarded as a
    specific form of penalized likelihood, is computationally too expensive for
    many modern statistical applications. Other forms of penalized likelihood
    methods have been successfully developed over the last decade to cope with high
    dimensionality.

  24. Non-Concave Penalized Likelihood with NP-Dimensionality.

    Authors: Jianqing Fan, Jinchi Lv
    Subjects: Statistics
    Abstract

    Penalized likelihood methods are fundamental to ultra-high dimensional
    variable selection. How high dimensionality such methods can handle remains
    largely unknown. In this paper, we show that in the context of generalized
    linear models, such methods possess model selection consistency with oracle
    properties even for dimensionality of Non-Polynomial (NP) order of sample size,
    for a class of penalized likelihood approaches using folded-concave penalty
    functions, which were introduced to ameliorate the bias problems of convex
    penalty functions.

  25. A semiparametric model for cluster data.

    Authors: Wenyang Zhang, Jianqing Fan, Yan Sun
    Subjects: gr. Statistics
    Abstract

    In the analysis of cluster data, the regression coefficients are frequently
    assumed to be the same across all clusters. This hampers the ability to study
    the varying impacts of factors on each cluster. In this paper, a semiparametric
    model is introduced to account for varying impacts of factors over clusters by
    using cluster-level covariates. It achieves the parsimony of parametrization
    and allows the explorations of nonlinear interactions. The random effect in the
    semiparametric model also accounts for within-cluster correlation.

Syndicate content