While high-level data parallel frameworks, like MapReduce, simplify the
design and implementation of large-scale data processing systems, they do not
naturally or efficiently support many important data mining and machine
learning algorithms and can lead to inefficient learning systems. To help fill
this critical void, we introduced the GraphLab abstraction which naturally
expresses asynchronous, dynamic, graph-parallel computation while ensuring data
consistency and achieving a high degree of parallel performance in the
shared-memory setting.
Designing and implementing efficient, provably correct parallel machine
learning (ML) algorithms is challenging. Existing high-level parallel
abstractions like MapReduce are insufficiently expressive while low-level tools
like MPI and Pthreads leave ML experts repeatedly solving the same design
challenges. By targeting common patterns in ML, we developed GraphLab, which
improves upon abstractions like MapReduce by compactly expressing asynchronous
iterative algorithms with sparse computational dependencies while ensuring data
consistency and achieving a high degree of parallel performance.
One of the main challenges in building a large scale publish-subscribe
infrastructure in an enterprise network, is to provide the subscribers with the
required information, while minimizing the consumed host and network resources.
Typically, previous approaches utilize either IP multicast or point-to-point
unicast for efficient dissemination of the information.
We propose a new distributed algorithm for computing a truncated Newton
method, where the main diagonal of the Hessian is computed using belief
propagation. As a case study for this approach, we examine the sensor selection
problem, a Boolean convex optimization problem. We form two distributed
algorithms. The first algorithm is a distributed version of the interior point
method by Joshi and Boyd, and the second algorithm is an order of magnitude
faster approximation. As an example application we discuss distributed anomaly
detection in networks.
We consider the problem of estimating a pattern of faults, represented as a
binary vector, from a set of measurements. Maximum a posteriori probability
(MAP) estimation of the fault pattern leads to a difficult combinatorial
optimization problem. Following recent success of the belief propagation
framework in the related compressive sensing and low density lattice decoding
domains, we propose a novel relaxation of the problem using non-parametric
belief propagation (NBP) for creating a distributed solution.