We suggest using the max-norm as a convex surrogate constraint for
clustering. We show how this yields a better exact cluster recovery guarantee
than previously suggested nuclear-norm relaxation, and study the effectiveness
of our method, and other related convex relaxations, compared to other
clustering approaches.
We present a simple, yet effective, approach to Semi-Supervised Learning. Our
approach is based on estimating density-based distances (DBD) using a shortest
path calculation on a graph. These Graph-DBD estimates can then be used in any
distance-based supervised learning method, such as Nearest Neighbor methods and
SVMs with RBF kernels. In order to apply the method to very large data sets, we
also present a novel algorithm which integrates nearest neighbor computations
into the shortest path search and can find exact shortest paths even in
extremely large dense graphs.
Mini-batch algorithms have been proposed as a way to speed-up stochastic
convex optimization problems. We study how such algorithms can be improved
using accelerated gradient methods. We provide a novel analysis, which shows
how standard gradient methods may sometimes be insufficient to obtain a
significant speed-up and propose a novel accelerated gradient algorithm, which
deals with this deficiency, enjoys a uniformly superior guarantee and works
well in practice.
We consider the problem of approximately reconstructing a partially-observed,
approximately low-rank matrix. This problem has received much attention lately,
mostly using the trace-norm as a surrogate to the rank. Here we study low-rank
matrix reconstruction using both the trace-norm, as well as the less-studied
max-norm, and present reconstruction guarantees based on existing analysis on
the Rademacher complexity of the unit balls of these norms. We show how these
are superior in several ways to recently published guarantees based on
specialized analysis.
We obtain a tight distribution-specific characterization of the sample
complexity of large-margin classification with L_2 regularization: We introduce
the \gamma-adapted-dimension, which is a simple function of the spectrum of a
distribution's covariance matrix, and show distribution-specific upper and
lower bounds on the sample complexity, both governed by the
\gamma-adapted-dimension of the source distribution. We conclude that this new
quantity tightly characterizes the true sample complexity of large-margin
classification.
We show that matrix completion with trace-norm regularization can be
significantly hurt when entries of the matrix are sampled non-uniformly. We
introduce a weighted version of the trace-norm regularizer that works well also
with non-uniform sampling. Our experimental results demonstrate that the
weighted trace-norm regularization indeed yields significant gains on the
(highly non-uniformly sampled) Netflix dataset.