Researchers working on the automatic parallelization of programs have long
known that too much parallelism can be even worse for performance than too
little, because spawning a task to be run on another CPU incurs overheads.
Autoparallelizing compilers have therefore long tried to use granularity
analysis to ensure that they only spawn off computations whose cost will
probably exceed the spawn-off cost by a comfortable margin. However, this is
not enough to yield good results, because data dependencies may \emph{also}
limit the usefulness of running computations in parallel.
The behavior of parallel programs is even harder to understand than the
behavior of sequential programs. Parallel programs may suffer from any of the
performance problems affecting sequential programs, as well as from several
problems unique to parallel systems. Many of these problems are quite hard (or
even practically impossible) to diagnose without help from specialized tools.
We present a proposal for a tool for profiling the parallel execution of
Mercury programs, a proposal whose implementation we have already started.