Column-oriented indexes-such as projection or bitmap indexes-are compressed
by run-length encoding to reduce storage and increase speed. Sorting the tables
improves compression. On realistic data sets, permuting the columns in the
right order before sorting can reduce the number of runs by a factor of two or
more. For many cases, we prove that the number of runs in table columns is
minimized if we sort columns by increasing cardinality. Yet-maybe
surprisingly-we must sometimes maximize the number of runs to minimize the
index size.