R2-D2: ColoR-inspired Convolutional NeuRal Network (CNN)-based AndroiD Malware Detections

12 May 2017

Hung-Yu Kao

Abstract

Machine Learning (ML) has found it particularly useful in malware detection. However, as the malware evolves very fast, the stability of the feature extracted from malware serves as a critical issue in malware detection. The recent success of deep learning in image recognition, natural language processing, and machine translation indicates a potential solution for stabilizing the malware detection effectiveness. In this research, we haven't extract selected any features (e.g., the control-flow of op-code, classes, methods of functions and the timing they are invoked etc.) from Android apps. We develop our own method for translating Android apps into rgb color code and transform them to a fixed-sized encoded image. After that, the encoded image is fed to convolutional neural network (CNN) for automatic feature extraction and learning, reducing the expert's intervention. Deep learning usually involves a large number of parameters that cannot be learned from only a small dataset. In this way, we currently have collected 1500k Android apps samples, have run our system over these 800k malware samples (benign and malicious samples are roughly equal-sized), and also through our back-end (60 million monthly active users and 10k new malware samples per day), we can effectively detect the malware. We believe that our methodology and the corresponding use of deep learning malware classification can overcome the weakness, and computational cost of the common static/dynamic analysis process or machine learning-based of Android malware detection approach.

View on arXiv

Comments on this paper