Modern Web services, such as those at Google, Yahoo!, and Amazon, handle
billions of requests per day on clusters of thousands of computers. Because
these services operate under strict performance requirements, a statistical
understanding of their performance is of great practical interest. Such
services are modeled by networks of queues, where one queue models each of the
individual computers in the system. A key challenge is that the data is
incomplete, because recording detailed information about every request to a
heavily used system can require unacceptable overhead.
We present the design and development of a data stream system that captures
data uncertainty from data collection to query processing to final result
generation. Our system focuses on data that is naturally modeled as continuous
random variables. For such data, our system employs an approach grounded in
probability and statistical theory to capture data uncertainty and integrates
this approach into high-volume stream processing. The first component of our
system captures uncertainty of raw data streams from sensing devices.