43
0

Obfuscation-Resilient Binary Code Similarity Analysis using Dominance Enhanced Semantic Graph

Main:9 Pages
5 Figures
Bibliography:2 Pages
6 Tables
Abstract

Binary code similarity analysis (BCSA) serves as a core technique for binary analysis tasks such as vulnerability detection. While current graph-based BCSA approaches capture substantial semantics and show strong performance, their performance suffers under code obfuscation due to the unstable control flow. To address this issue, we develop ORCAS, an Obfuscation-Resilient BCSA model based on Dominance Enhanced Semantic Graph (DESG). The DESG is an original binary code representation, capturing more binaries' implicit semantics without control flow structure, including inter-instruction relations, inter-basic block relations, and instruction-basic block relations. ORCAS robustly scores semantic similarity across binary functions from different obfuscation options, optimization levels, and instruction set architectures. Extensive evaluation on the BinKit dataset shows ORCAS significantly outperforms eight baselines, achieving an average 12.1% PR-AUC gain when using combined three obfuscation options compared to the state-of-the-art approaches. Furthermore, ORCAS improves recall by up to 43% on an original obfuscated real-world vulnerability dataset, which we released to facilitate future research.

View on arXiv
@article{wang2025_2506.06161,
  title={ Obfuscation-Resilient Binary Code Similarity Analysis using Dominance Enhanced Semantic Graph },
  author={ Yufeng Wang and Yuhong Feng and Yixuan Cao and Haoran Li and Haiyue Feng and Yifeng Wang },
  journal={arXiv preprint arXiv:2506.06161},
  year={ 2025 }
}
Comments on this paper