We design a data-dependent metric in $\mathbb R^d$ and use it to define the
$k$-nearest neighbors of a given point. Our metric is invariant under all
affine transformations. We show that, with this metric, the standard
$k$-nearest neighbor regression estimate is asymptotically consistent under the
usual conditions on $k$, and minimal requirements on the input data.
The single-index model is known to offer a flexible way to model a variety of
high-dimensional real-world phenomena. However, despite its relative implicity,
this dimension reduction scheme is faced with severe complications as soon as
the underlying dimension becomes larger than the number of observations ("p
larger than n" paradigm). To circumvent this difficulty, we consider the
single-index model estimation problem from a sparsity perspective using a
PAC-Bayesian approach.
Collaborative recommendation is an information-filtering technique that
attempts to present information items that are likely of interest to an
Internet user. Traditionally, collaborative systems deal with situations with
two types of variables, users and items. In its most common form, the problem
is framed as trying to estimate ratings for items that have not yet been
consumed by a user. Despite wide-ranging literature, little is known about the
statistical properties of recommendation systems.
Random forests are a scheme proposed by Leo Breiman in the 00's for building
a predictor ensemble with a set of decision trees that grow in randomly
selected subspaces of data. Despite growing interest and practical use, there
has been little exploration of the statistical properties of random forests,
and little is known about the mathematical forces driving the algorithm. In
this paper, we offer an in-depth analysis of a random forests model suggested
by Breiman in 2004, which is very close to the original algorithm.
Many statistical estimation techniques for high-dimensional or functional
data are based on a preliminary dimension reduction step, which consists in
projecting the sample $\bX_1, \hdots, \bX_n$ onto the first $D$ eigenvectors of
the Principal Component Analysis (PCA) associated with the empirical projector
$\hat \Pi_D$. Classical nonparametric inference methods such as kernel density
estimation or kernel regression analysis are then performed in the (usually
small) $D$-dimensional space.
Collaborative recommendation is an information-filtering technique that
attempts to present information items (movies, music, books, news, images, Web
pages, etc.) that are likely of interest to the Internet user. Traditionally,
collaborative systems deal with situations with two types of variables, users
and items. In its most common form, the problem is framed as trying to estimate
ratings for items that have not yet been consumed by a user. Despite
wide-ranging literature, little is known about the statistical properties of
recommendation systems.
Collaborative recommendation is an information-filtering technique that
attempts to present information items (movies, music, books, news, images, Web
pages, etc.) that are likely of interest to the Internet user. Traditionally,
collaborative systems deal with situations with two types of variables, users
and items. In its most common form, the problem is framed as trying to estimate
ratings for items that have not yet been consumed by a user. Despite
wide-ranging literature, little is known about the statistical properties of
recommendation systems.
Motivated by a broad range of potential applications, we address the quantile
prediction problem of real-valued time series. We present a sequential quantile
forecasting model based on the combination of a set of elementary nearest
neighbor-type predictors called "experts" and show its consistency under a
minimum of conditions. Our approach builds on the methodology developed in
recent years for prediction of individual sequences and exploits the quantile
structure as a minimizer of the so-called pinball loss function.