Optimizing Cache Performance for Graph Analytics

3 August 2016

Yunming Zhang

Abstract

Despite all the optimization work on in-memory graph analytics, today's graph frameworks still do not utilize modern hardware well. In this paper, we show that it is possible to achieve up to 4 $\times$ speedups over the fastest in-memory frameworks by greatly improving graph applications' use of the cache. Previous systems have applied out-of-core processing techniques from the memory/disk boundary to the cache/DRAM boundary. However, we find that blindly applying such techniques is ineffective because of the much smaller performance gap between DRAM and cache. We present two techniques that take advantage of the cache with minimal or no instruction overhead. The first, vertex reordering, sorts the vertices by degree to improve the utilization of each cache line with no runtime overhead. The second, CSR segmenting, partitions the graph to restrict all random accesses to the cache, makes all DRAM access sequential, and merges partition results using a very low overhead cache-aware merge. Both techniques can be easily implemented on top of optimized graph frameworks. Our techniques combined give speedups of up to 4 $\times$ for PageRank and Collaborative Filtering, and 2 $\times$ for Betweenness Centrality over the best published results.

View on arXiv

Comments on this paper