Dan Suciu

  1. PerfXplain: Debugging MapReduce Job Performance.

    Authors: Nodira Khoussainova, Dan Suciu, Magdalena Balazinska
    Subjects: Databases
    Abstract

    While users today have access to many tools that assist in performing large
    scale data analysis tasks, understanding the performance characteristics of
    their parallel computations, such as MapReduce jobs, remains difficult. We
    present PerfXplain, a system that enables users to ask questions about the
    relative performances (i.e., runtimes) of pairs of MapReduce jobs. PerfXplain
    provides a new query language for articulating performance queries and an
    algorithm for generating explanations from a log of past MapReduce job
    executions.

  2. The Complexity of Causality and Responsibility for Query Answers and non-Answers.

    Authors: Wolfgang Gatterbauer, Dan Suciu, Alexandra Meliou, Katherine M. Moore
    Subjects: Databases
    Abstract

    An answer to a query has a well-defined lineage expression (alternatively
    called how-provenance) that explains how the answer was derived. Recent work
    has also shown how to compute the lineage of a non-answer to a query. However,
    the cause of an answer or non-answer is a more subtle notion and consists, in
    general, of only a fragment of the lineage. In this paper, we adapt Halpern,
    Pearl, and Chockler's recent definitions of causality and responsibility to
    define the causes of answers and non-answers to queries, and their degree of
    responsibility.

  3. Boosting the Accuracy of Differentially-Private Histograms Through Consistency.

    Authors: Dan Suciu, Michael Hay, Vibhor Rastogi, Gerome Miklau
    Subjects: Databases
    Abstract

    We show that it is possible to significantly improve the accuracy of a
    general class of histogram queries while satisfying differential privacy. Our
    approach carefully chooses a set of queries to evaluate, and then exploits
    consistency constraints that should hold over the noisy output. In a
    post-processing phase, we compute the consistent input most likely to have
    produced the noisy output. The final output is differentially-private and
    consistent, but in addition, it is often much more accurate.

  4. A Case for A Collaborative Query Management System.

    Authors: Nodira Khoussainova, Magda Balazinska, Wolfgang Gatterbauer, YongChul Kwon, Dan Suciu
    Subjects: Databases
    Abstract

    Over the past 40 years, database management systems (DBMSs) have evolved to
    provide a sophisticated variety of data management capabilities. At the same
    time, tools for managing queries over the data have remained relatively
    primitive. One reason for this is that queries are typically issued through
    applications. They are thus debugged once and re-used repeatedly. This mode of
    interaction, however, is changing. As scientists (and others) store and share
    increasingly large volumes of data in data centers, they need the ability to
    analyze the data by issuing exploratory queries.

Syndicate content