Malware Recognition with Binary Fingerprint Final Meeting Students
Malware Recognition with Binary Fingerprint Final Meeting Students : Tal Greenshpan & Offer Akrabi Supervisors : Ben Herzog & Amir Mizrahi (Check. Point)
Goals • Build an automated classifier for new malware • Using static analysis methods • Help reverse engineers classify new malware • Comparing new functions to known functions
Methodology • • Static Analysis • PE files Research important features in function comparison • Reverse engineering Extract key features in order to identify resemblance between functions • Keep only key features Develop an algorithm to determine feature similarity • • Compare functions Feature contribution
Methodology • • Build a database of known functions • MSSQL Develop extractor and classifier • • Python IDAPython Testing Extra: GUI
• Achievements Decided on a set of features to be used to differentiate functions • • Function size Number of API call Register count Memory count Arguments count Local variables size Features from the Function Call Graph (Generated by IDA) • • Number of Nodes Min/Max Out-degree Min /Max In-degree Min/Max Well Connected Components size Ratio of out-degrees that are larger than 1 Ratio of in-degrees that are larger than 1 Ratio of Well Connected Components that are larger than 1
Achievements • Automated mass feature extraction • Low runtime complexity • Created an Algorithm to differentiate functions • Feature contribution • • Standard deviation Using the Numpy Python library
Achievements • Successfully matched functions from actual malware samples!
Example • Two very similar simple C++ malware like programs • • • Different number of arguments Different number of local variables Different order of declaration • Database containing about 2, 500 functions
Perfect match : Resemblance = 34
• Function Call Graphs (generated by IDA) for the encryption function twin 1. exe twin 2. exe
Live Demonstration • Database containing about 1, 000 functions • • Suspected Zeus malware related files Locky ransomware samples • Analysis of a different Locky sample, not in the database • • File analyzed : 0 deb_U. exe Function analyzed: sub_402743
Conclusions • Efficient classification of functions with selected features • The first set of features we selected did not get sufficient results • Can improve run time significantly – cost to accuracy • Euclidian distance not good enough to differentiate functions • Good classification accuracy • Run time complexity for very large databases could be problematic • Removing only one feature
Thank you!
- Slides: 13