Regularization techniques are widely used for tackling
high-dimension-low-sample-size problems. Yet, finding the right amount of
regularization can be challenging, especially in the unsupervised setting such
as structure learning problems where traditional methods such as BIC or
cross-validation often do not work well. In this paper, we propose a new method
--- Bootstrap Inference for Network COnstruction (BINCO) --- to infer networks
by directly controlling the false discovery rates (FDRs) of the selected edges.
This method utilizes the idea of model aggregation.
We consider the problem of estimating multiple related but distinct graphical
models on the basis of a high-dimensional data set with observations that
belong to distinct classes. A motivating example occurs in the analysis of gene
expression data for tissue samples with and without cancer. In this case, we
might wish to estimate a gene expression network for the normal tissue and a
gene expression network for the tumor tissue.
This is a note on logistic regression models and logistic kernel machine
models. It contains derivations to some of the expressions in a paper -- SNP
Set Analysis for Detecting Disease Association Using Exon Sequence Data --
submitted to BMC proceedings by these authors.
Recent advances in tissue microarray technology have allowed
immunohistochemistry to become a powerful medium-to-high throughput analysis
tool, particularly for the validation of diagnostic and prognostic biomarkers.
However, as study size grows, the manual evaluation of these assays becomes a
prohibitive limitation; it vastly reduces throughput and greatly increases
variability and expense. We propose an algorithm - Tissue Array Co-Occurrence
Matrix Analysis (TACOMA) - for quantifying cellular phenotypes based on
textural regularity summarized by local inter-pixel relationships.
In a recent paper (Efron (2004)), Efron pointed out that an important issue
in large-scale multiple hypothesis testing is that the null distribution may be
unknown and need to be estimated. Consider a Gaussian mixture model, where the
null distribution is known to be normal but both null parameters--the mean and
the variance--are unknown. We address the problem with a method based on
Fourier transformation.
Genomic instability, the propensity of aberrations in chromosomes, plays a
critical role in the development of many diseases. High throughput genotyping
experiments have been performed to study genomic instability in diseases. The
output of such experiments can be summarized as high dimensional binary
vectors, where each binary variable records aberration status at one marker
locus. It is of keen interest to understand how these aberrations interact with
each other. In this paper, we propose a novel method, \texttt{LogitNet}, to
infer the interactions among aberration events.