Matrix computations, especially iterative PDE solving (and the sparse matrix
vector multiplication subproblem within) using conjugate gradient algorithm,
and LU/Cholesky decomposition for solving system of linear equations, form the
kernel of many applications, such as circuit simulators, computational fluid
dynamics or structural analysis etc. The problem of designing approaches for
parallelizing these computations, to get good speedups as much as possible as
per Amdahl's law, has been continuously researched upon.