The problem of active diagnosis arises in several applications such as
disease diagnosis, and fault diagnosis in computer networks, where the goal is
to rapidly identify the binary states of a set of objects (e.g., faulty or
working) by sequentially selecting, and observing, (noisy) responses to binary
valued queries. Current algorithms in this area rely on loopy belief
propagation for active query selection. These algorithms have an exponential
time complexity, making them slow and even intractable in large networks.
We present surrogate regret bounds for arbitrary surrogate losses in the
context of binary classification with label-dependent costs. Such bounds relate
a classifier's risk, assessed with respect to a surrogate loss, to its
cost-sensitive classification risk. Two approaches to surrogate regret bounds
are developed. The first is a direct generalization of Bartlett et al.
Flow cytometry is a technology that rapidly measures antigen-based markers
associated to cells in a cell population. Although analysis of flow cytometry
data has traditionally considered one or two markers at a time, there has been
increasing interest in multidimensional analysis. However, flow cytometers are
limited in the number of markers they can jointly observe, which is typically a
fraction of the number of markers of interest. For this reason, practitioners
often perform multiple assays based on different, overlapping combinations of
markers.
In query learning, the goal is to identify an unknown object while minimizing
the number of "yes" or "no" questions (queries) posed about that object. A
well-studied algorithm for query learning is known as generalized binary search
(GBS). We show that GBS is a greedy algorithm to optimize the expected number
of queries needed to identify the unknown object. We also generalize GBS in two
ways. First, we consider the case where the cost of querying grows
exponentially in the number of queries and the goal is to minimize the expected
exponential cost.
In query learning, the goal is to identify an unknown object while minimizing
the number of "yes or no" questions (queries) posed about that object. We
consider three extensions of this fundamental problem that are motivated by
practical considerations in real-world, time-critical identification tasks such
as emergency response. First, we consider the problem where the objects are
partitioned into groups, and the goal is to identify only the group to which
the object belongs.
Consider the problem of estimating the $\gamma$-level set
$G^*_{\gamma}=\{x:f(x)\geq\gamma\}$ of an unknown $d$-dimensional density
function $f$ based on $n$ independent observations $X_1,...,X_n$ from the
density. This problem has been addressed under global error criteria related to
the symmetric set difference. However, in certain applications a spatially
uniform mode of convergence is desirable to ensure that the estimated set is
close to the target set everywhere. The Hausdorff error criterion provides this
degree of uniformity and, hence, is more appropriate in such situations.