Control Flow Change in Assembly as a Classifier in Malware Analysis

9 September 2016

Abstract

As currently classical malware detection methods based on signatures fail to detect new malware, they are not always efficient with new obfuscation techniques. Besides, new malware is easily created and old malware can be recoded to produce new one. Therefore, classical Antivirus becomes consistently less effective in dealing with those new threats. Also malware gets hand tailored to bypass network security and Antivirus. But as analysts do not have enough time to dissect suspected malware by hand, automated approaches have been developed. To cope with the mass of new malware, statistical and machine learning methods proved to be a good approach classifying programs, especially when using multiple approaches together to provide a likelihood of software being malicious. In normal approach, some steps have been taken, mostly by analyzing the opcodes or mnemonics of disassembly and their distribution. In this paper, we focus on the control flow change (CFC) itself and finding out if it is significant to detect malware. In the scope of this work, only relative control flow changes are contemplated, as these are easier to extract from the first chosen disassembler library and are within a range of 256 addresses. These features are analyzed as a raw feature, as n-grams of length 2, 4 and 6 and the even more abstract feature of the occurrences of the n-grams is used. Statistical methods were used as well as the Naive-Bayes algorithm to find out if there is significant data in CFC. We also test our approach with real-world datasets.

View on arXiv

Comments on this paper