Efficient Redundancy Techniques for Latency Reduction in Cloud Systems

In cloud computing systems, assigning a task to multiple servers and waiting for the earliest copy to finish is an effective method to combat the variability in response time of individual servers, and thus reduce average latency. But adding redundancy may result in higher cost of computing resources, as well as an increase in queueing delay due to higher traffic load. This work provides a fundamental understanding of when and how redundancy gives a cost-efficient reduction in latency. For a general task service time distribution, we compare different redundancy strategies, for e.g. the number of redundant tasks, and the time when they are issued and canceled. We get the insight that the log-concavity of the task service distribution is a key factor in determining whether adding redundancy helps. If the service distribution is log-convex, then adding maximum redundancy reduces both latency and cost. And if it is log-concave, then less redundancy, and early cancellation of redundant tasks is more effective. We also present a heuristic strategy that achieves a good latency-cost trade-off for an arbitrary service distribution. This work also generalizes and extends some results in the famously hard analysis of fork-join queues.
View on arXiv