Users of MapReduce often run into performance problems when they scale up
their workloads. Many of the problems they encounter can be overcome by
applying techniques learned from over three decades of research on parallel
DBMSs. However, translating these techniques to a MapReduce implementation such
as Hadoop presents unique challenges that can lead to new design choices. This
paper describes how column-oriented storage techniques can be incorporated in
Hadoop in a way that preserves its popular programming APIs.
Database management systems (DBMSs) have largely ignored the task of managing
the energy consumed during query processing. Both economical and environmental
factors now require that DBMSs pay close attention to energy consumption. In
this paper we approach this issue by considering energy consumption as a
first-class performance goal for query processing in a DBMS. We present two
concrete techniques that can be used by a DBMS to directly manage the energy
consumption. Both techniques trade energy consumption for performance.