On Throughput-Delay Optimal Access to Storage Clouds via Load Adaptive Coding and Chunking

20 March 2014

Abstract

Recent literature including our past work provide analysis and solutions for using (i) erasure coding, (ii) parallelism, or (iii) variable slicing/chunking (i.e., dividing an object of a specific size into a variable number of smaller chunks) in speeding the I/O performance of storage clouds. However, a comprehensive approach that considers all three dimensions together to achieve the best throughput-delay trade-off curve had been lacking. This paper presents the first set of solutions that can pick the best combination of coding rate and object chunking/slicing options as the load dynamically changes. Our specific contributions are as follows: (1) We establish via measurement that combining variable coding rate and chunking is mostly feasible over a popular public cloud. (2) We relate the delay optimal values for chunking level and code rate to the queue backlogs via an approximate queueing analysis. (3) Based on this analysis, we propose TOFEC that adapts the chunking level and coding rate against the queue backlogs. Our trace-driven simulation results show that TOFEC's adaptation mechanism converges to an appropriate code that provides the optimal throughput-delay trade-off without reducing system capacity. Compared to a non-adaptive strategy optimized for throughput, TOFEC delivers $2.5\times$ lower latency under light workloads; compared to a non-adaptive strategy optimized for latency, TOFEC can scale to support over $3\times$ as many requests. (4) We propose a simpler greedy solution that performs on a par with TOFEC in average delay performance, but exhibits significantly more performance variations.

View on arXiv

Comments on this paper