Malware Detection using Machine Learning Nawaf Abudawaood University
Malware Detection using Machine Learning Nawaf Abudawaood University of Colorado at Colorado Springs (UCCS) Project proposal 4/17/2019
Outline • • • Introduction Related Work Malware Binary to an Image Malware Images Dataset CNN ILSVRC Models Model Description Proposal Conclusion References
Introduction • Malware has been improving in the way that it avoids malware detectors, thousands of malwares are being created everyday. • Malware can be hidden within a code using different techniques. • A new signature of a file can make it difficult for malware to be detected. • Abnormal behaviors can occur after a malicious code is executed. • Reverse engineers can find malware code after analyzing malicious files. • The intent of malware authors is to destroy and/or harm computer systems.
Related Work • Can be used as an Artificial Neural Network (ANN) that combines Recurrent Neural Networks (RNNs) with Long-Short-Term-Memory (LSTM) for machine learning. Saxe and Berlin (2015) extract features that are represented based on the byte entropy. • Detected malware using the Dynamic analysis approach (wang et al. , 2018) • N-gram and behavior based approaches that focused on raw bytes (Kolonjaji, et al. 2018)
Related Work • Saxe and Berlin approaches to avoid overfitting using cross-validation that creates more than one path for the classification to work properly in machine learning. (2015) • Alazab mentioned the danger of zero-day attacks. (2011) • Sheneamer, Abdullah, Swarup Roy, and Jugal Kalita. “A detection framework for sematic code clones and obfuscated code”. (2018)
Malware Binary to an Image
Related Work • Aldujaili talked about malware binary image files based on the location of the pixel that encodes portable executable files into binaries. (2018) • Szegedy et. , al Google. Net Inception v 1 Going deeper with convolutions • Szegedy et. , al Inception v 3 Rethinking the inception architecture for computer vision. (2015)
Malware Images Dataset • Vision Research Lab from the University of California, Santa Barbara • Contains 9, 342 malware images based on the 25 malware families. • Adialer. C | Agent. FYI | Allaple. A | Allaple. L | Alueron. gen!J | Autorun. K |C 2 LOP. gen!g | C 2 LOP. P | Dialplatform. B | Dontovo. A | Fakerean | Instantaccess | Lolyda. AA 1 | Lolyda. AA 2 | Lolyda. AA 3 | Lolyda. AT | Malex. gen!J | Obfuscator. AD | Rbot!gen | Skintrim. N | Swizzor. gen!E | Swizzor. gen!I | VB. AT | Wintrim. BX | Yuner. A | Adialer. C Autorun. k Fakerean Lolyda. AT Swizzor. gen!E
CNN’s Importance for Image Classification • CNNs are the best at classifying images based on the extraction of features that are done for the model. • A convolution function occurs because the data gets compressed specially when the features are being extracted, which will achieve best results for vision recognition. • A fully connected convolution will allow for the whole network to be connected through each node (pixel) and that’s why CNNs are being used over other non-CNN models. • CNNs are similar to how the brain can visualize things based on objects such as the eyes and how far they are apart on the face of a human or an animal. • Can be recognized using CNNs for image classification.
Simple CNN Architecture
CNN ILVSRC Models • • • • Because they are the winners of the Image. Net Large-Scale Visual Recognition Challenge (ILSVRC) which makes them the best at classifying images. Alex. Net (2012) ZFNet (2013) VGG Net 16 (2014) Google. Net Inception v 1 (2014) Inception v 2 (2014) Inception v 3 (2015) Res. Net (2015) Inception-Res. Net v 4 (2016) Xception (2016) Dense. Net(2016) Nas. Net (2017) Mobile. Net (2017)
Google. Net Inception v 1
Inception v 3 Importance • Reduced computational cost for the 5 x 5 used in Google. Net Inception v 1 to the 2 3 x 3 conv Inception v 3 which will be 2. 78 times cheaper and better results to prove that it performs a lot better than the previous models.
Inception v 3 Importance
Grid reduction and Auxiliary classifier
Model Description • Inception v 3 has 42 layers, the winner of the ILSVRC in 2016. • Reduced the error rate by using less parameters.
Proposal • We propose a method that uses ILSVRC models to classify malware images. • CNNs will automatically extract features. • Using the vision lab dataset to achieve outstanding results. • Compare the results of the ILSVRC models using the same malimg dataset.
Conclusion • Malware have been a major concern. • Malware bypasses malware detectors. • Machine learning can be very helpful in improving malware detection. • A number of ILSVRC models are used. • classify malware images using the vision lab dataset.
References • [1] Sethi, Kamalakanta, et al. "A novel malware analysis for malware detection and classification using machine learning algorithms. “ Proceedings of the 10 th International Conference on Security of Information and Networks. • ACM, 2017. [2] Sewak, Mohit, Sanjay K. Sahay, and Hemant Rathore. "Comparison of deep learning and the classical machine learning algorithm for the malware detection. " 2018 19 th IEEE/ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD). IEEE, 2018. • • • [3] Kolosnjaji, Bojan, et al. "Deep learning for classification of malware system call sequences. " Australasian Joint Conference on Artificial Intelligence. Springer, Cham, 2016. [4] Buczak, Anna L. , and Erhan Guven. "A survey of data mining and machine learning methods for cyber security intrusion detection. " IEEE Communications Surveys & Tutorials 18. 2 (2016): 1153 -1176. [5] Sethi, Kamalakanta, et al. "A Novel Malware Analysis Framework for Malware Detection and Classification using Machine Learning Approach. " Proceedings of the 19 th International Conference on Distributed Computing and Networking. ACM, 2018.
References • • • [6] Cui, Zhihua, et al. "Detection of malicious code variants based on deep learning. " IEEE Transactions on Industrial Informatics 14. 7 (2018): 3187 -3196. [7] Sheneamer, Abdullah, Swarup Roy, and Jugal Kalita. "A detection framework for semantic code clones and obfuscated code. " Expert Systems with Applications 97 (2018): 405 -420. [8] Kolosnjaji, Bojan, et al. "Adversarial Malware Binaries: Evading Deep Learning for Malware Detection in Executables. " ar. Xiv preprint ar. Xiv: 1803. 04173(2018). [9] Wang, Yao, Wan‐dong Cai, and Peng‐cheng Wei. "A deep learning approach for detecting malicious Java. Script code. " Security and Communication Networks 9. 11 (2016): 1520 -1534. [10] Yan, Jinpei, Yong Qi, and Qifan Rao. "Detecting malware with an ensemble method based on deep neural network. " Security and Communication Networks 2018 (2018).
References [11] Saxe, Joshua, and Konstantin Berlin. "Deep neural network based malware • • detection using two dimensional binary program features. " Malicious and Unwanted Software (MALWARE), 2015 10 th International Conference on. IEEE, 2015. [12] Al-Dujaili, Abdullah, et al. "Adversarial deep learning for robust detection of binary encoded malware. " 2018 IEEE Security and Privacy Workshops (SPW). IEEE, 2018. [13] Alazab, Mamoun, et al. "Zero-day malware detection based on supervised learning algorithms of API call signatures. " Proceedings of the Ninth Australasian Data Mining Conference-Volume 121. Australian Computer Society, Inc. , 2011. [14] He, Kaiming, et al. "Deep residual learning for image recognition. " Proceedings of the IEEE conference on computer vision and pattern recognition. 2016. [15] Szegedy, Christian, et al. "Going deeper with convolutions. " Proceedings of the IEEE conference on computer vision and pattern recognition. 2015.
- Slides: 24