Nicolò Cesa-Bianchi

  1. Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems.

    Authors: Sébastien Bubeck, Nicolò Cesa-Bianchi
    Subjects: Learning
    Abstract

    Multi-armed bandit problems are the most basic examples of sequential
    decision problems with an exploration-exploitation trade-off. This is the
    balance between staying with the option that gave highest payoffs in the past
    and exploring new options that might give higher payoffs in the future.
    Although the study of bandit problems dates back to the Thirties,
    exploration-exploitation trade-offs arise in several modern applications, such
    as ad placement, website optimization, and packet routing. Mathematically, a
    multi-armed bandit is defined by the payoff process associated with each
    option.

  2. An Optimal Algorithm for Linear Bandits.

    Authors: Sham Kakade, Nicolò Cesa-Bianchi
    Subjects: Learning
    Abstract

    We provide the first algorithm for online bandit linear optimization whose
    regret after T rounds is of order sqrt{Td ln N} on any finite class X of N
    actions in d dimensions, and of order d*sqrt{T} (up to log factors) when X is
    infinite. These bounds are not improvable in general. The basic idea utilizes
    tools from convex geometry to construct what is essentially an optimal
    exploration basis. We also present an application to a model of linear bandits
    with expert advice.

  3. Efficient Online Learning via Randomized Rounding.

    Authors: Ohad Shamir, Nicolò Cesa-Bianchi
    Subjects: Learning
    Abstract

    Most online algorithms used in machine learning today are based on variants
    of mirror descent or follow-the-leader. In this paper, we present an online
    algorithm based on a completely different approach, which combines "random
    playout" and randomized rounding of loss subgradients. As an application of our
    approach, we provide the first computationally efficient online algorithm for
    collaborative filtering with norm-constrained matrices. As a second
    application, we solve an open question linking batch learning and transductive
    online learning.

  4. PAC-Bayesian Analysis of the Exploration-Exploitation Trade-off.

    Authors: John Shawe-Taylor, Nicolò Cesa-Bianchi, Yevgeny Seldin, François Laviolette, Jan Peters, Peter Auer
    Subjects: Learning
    Abstract

    We develop a coherent framework for integrative simultaneous analysis of the
    exploration-exploitation and model order selection trade-offs. We improve over
    our preceding results on the same subject (Seldin et al., 2011) by combining
    PAC-Bayesian analysis with Bernstein-type inequality for martingales. Such a
    combination is also of independent interest for studies of multiple
    simultaneously evolving martingales.

  5. Online Learning of Noisy Data with Kernels.

    Authors: Shai Shalev-Shwartz, Ohad Shamir, Nicolò Cesa-Bianchi
    Subjects: Learning
    Abstract

    We study online learning when individual instances are corrupted by random
    noise. We assume the noise distribution is unknown, and may change over time
    with no restriction other than having zero mean and bounded variance. Our
    technique relies on a family of unbiased estimators for non-linear functions,
    which may be of independent interest. We show that a variant of online gradient
    descent can learn functions in any dot-product (e.g., polynomial) or Gaussian
    kernel space with any analytic convex loss function.

  6. Efficient Learning with Partially Observed Attributes.

    Authors: Shai Shalev-Shwartz, Ohad Shamir, Nicolò Cesa-Bianchi
    Subjects: Learning
    Abstract

    We describe and analyze efficient algorithms for learning a linear predictor
    from examples when the learner can only view a few attributes of each training
    example. This is the case, for instance, in medical research, where each
    patient participating in the experiment is only willing to go through a small
    number of tests. Our analysis bounds the number of additional examples
    sufficient to compensate for the lack of full information on each training
    example.

Syndicate content