Over half a century old and showing no signs of aging, k-means remains one of
the most popular data processing algorithms. As is well-known, a proper
initialization of k-means is crucial for obtaining a good final solution. The
recently proposed k-means++ initialization algorithm achieves this, obtaining
an initial set of centers that is provably close to the optimum solution. A
major downside of the k-means++ is its inherent sequential nature, which limits
its applicability to massive data: one must make k passes over the data to find
a good initial set of centers.
Motivated by the problem of optimizing allocation in guaranteed display
advertising, we develop an efficient, lightweight method of generating a
compact {\em allocation plan} that can be used to guide ad server decisions.
The plan itself uses just O(1) state per guaranteed contract, is robust to
noise, and allows us to serve (provably) nearly optimally. The optimization
method we develop is scalable, with a small in-memory footprint, and working in
linear time per iteration.
A large fraction of online display advertising is sold via guaranteed
contracts: a publisher guarantees to the advertiser a certain number of user
visits satisfying the targeting predicates of the contract. The publisher is
then tasked with solving the ad serving problem - given a user visit, which of
the thousands of matching contracts should be displayed, so that by the
expiration time every contract has obtained the requisite number of user
visits.
Display advertising has traditionally been sold via guaranteed contracts -- a
guaranteed contract is a deal between a publisher and an advertiser to allocate
a certain number of impressions over a certain period, for a pre-specified
price per impression. However, as spot markets for display ads, such as the
RightMedia Exchange, have grown in prominence, the selection of advertisements
to show on a given page is increasingly being chosen based on price, using an
auction. As the number of participants in the exchange grows, the price of an
impressions becomes a signal of its value.
For most people, social contacts play an integral part in finding a new job.
As observed by Granovetter's seminal study, the proportion of jobs obtained
through social contacts is usually large compared to those obtained through
postings or agencies. At the same time, job markets are a natural example of
two-sided matching markets. An important solution concept in such markets is
that of stable matchings, and the use of the celebrated Gale-Shapley algorithm
to compute them.