Multigrid algorithms are among the fastest iterative methods known today for
solving large linear and some non-linear systems of equations. Greatly
optimized for serial operation, they still have a great potential for
parallelism not fully realized. In this work, we present a novel multigrid
algorithm designed to work entirely inside many-core architectures like the
graphics processing units (GPUs), without memory transfers between the GPU and
the central processing unit (CPU), avoiding low bandwitdth communications.