Advances in modern hardware, such as increases in the size of main memory available on computers, have made it possible to analyze data at a much higher rate than before. In this paper, we demonstrate that there is tremendous room for improvement in the processing of analytical queries on modern commodity hardware. We introduce BIPie, an engine for query processing implementing highly efficient decoding, selection, and aggregation for analytical queries executing on a columnar storage engine in MemSQL. We demonstrate that these operations are interdependent, and must be fused and considered together to achieve very high performance. We propose and compare multiple strategies for decoding, selection and aggregation (with GROUP BY), all of which are designed to take advantage of modern CPU architectures, including SIMD. We implemented these approaches in MemSQL, a high performance hybrid transaction and analytical processing database designed for commodity hardware. We thoroughly evaluate the performance of the approach across a range of parameters, and demonstrate a two to four times speedup over previously published TPC-H Query 1 performance.