We study a class of codes for compressing memoryless Gaussian sources,
designed using the statistical framework of high-dimensional linear regression.
Codewords are linear combinations of subsets of columns of a design matrix.
With maximum-likelihood encoding we show that such a codebook can attain the
rate-distortion function with the optimal error-exponent, for all distortions
below a specified value. The structure of the codebook is motivated by an
analogous construction proposed recently by Barron and Joseph for communication
over an AWGN channel.
The performance of Orthogonal Matching Pursuit (OMP) for variable selection
is analyzed for random designs. When contrasted with the deterministic case,
since the performance is here measured after averaging over the distribution of
the design matrix, one can have far less stringent sparsity constraints on the
coefficient vector. We demonstrate that for exact sparse vectors, the
performance of the OMP is similar to known results on the Lasso algorithm
[\textit{IEEE Trans. Inform. Theory} \textbf{55} (2009) 2183--2202].
For the additive Gaussian noise channel with average codeword power
constraint, sparse superposition codes and adaptive successive decoding is
developed. Codewords are linear combinations of subsets of vectors, with the
message indexed by the choice of subset. A feasible decoding algorithm is
presented. Communication is reliable with error probability exponentially small
for all rates below the Shannon capacity.
For the additive white Gaussian noise channel with average codeword power
constraint, new coding methods are devised in which the codewords are sparse
superpositions, that is, linear combinations of subsets of vectors from a given
design, with the possible messages indexed by the choice of subset. Decoding is
by least squares, tailored to the assumed form of linear combination.
Communication is shown to be reliable with error probability exponentially
small for all rates up to the Shannon capacity.