Nearly all modern software suffers from bloat that negatively impacts its performance and security. To combat this problem, several automated techniques have been proposed to debloat software. A key metric used in these works to demonstrate improved security is code reuse gadget count reduction. The use of this metric is based on the prevailing idea that reducing the number of gadgets available in a software package reduces its attack surface and makes mounting a gadget-based code reuse exploit such as return-oriented programming (ROP) more difficult for an attacker. In this paper, we challenge this idea and show through a variety of realistic debloating scenarios the flaws inherent to the gadget count metric. Specifically, we demonstrate that software debloating can achieve high gadget count reduction rates, yet fail to limit an attacker's ability to construct an exploit. Worse yet, in some scenarios high gadget count reduction rates conceal instances in which software debloating makes security worse by introducing new quality gadgets. To address these issues, we propose new metrics based on quality rather than quantity for assessing the security impact of software debloaitng. We show that these metrics can be efficiently calculated with our Gadget Set Analyzer tool. Finally, we demonstrate the the utility of these metrics through a realistic debloating case study.
View on arXiv