Hunting for Metamorphic Engines Wing Wong Mark Stamp
- Slides: 39
Hunting for Metamorphic Engines Wing Wong Mark Stamp Hunting for Metamorphic Engines 1
In This Paper, We… q Analyze metamorphic malware o Hacker-produced metamorphic code q Measure similarity of software o Based on n-gram analysis q Compute scores o Based on n-grams and o Based on HMMs q This paper is baseline for future work Hunting for Metamorphic Engines 2
Motivation q Many virus construction kits available o Many can produce metamorphic code q So anybody can create “new” version of existing malware o Virtually no technical expertise required q How “effective” is the resulting metamorphic code? q Can we detect metamorphic malware? Hunting for Metamorphic Engines 3
Background q Encrypted, polymorphic, metamorphic o Metamorphic == body polymorphic q Metamorphic vs cloned software o Clone is the norm, but metamorphic could offer advantages to the good guy too… q From theory, we know malware detection is NP-complete o And metamorphic is at least as hard o But what about practical situation? Hunting for Metamorphic Engines 4
Metamorphism q Metamorphic code changes it “shape” q Well-known examples o o W 95/Regswap W 32/Ghost W 95/Zperm Meta. PHOR Hunting for Metamorphic Engines 5
Metamorphism q General o o techniques available Insertion Substitution Transposition Deletion q Some easier to implement than others q Some more effective against certain detection strategies Hunting for Metamorphic Engines 6
Virus Construction Kits q In this paper, we consider o PS-MPC (Phalcon/Skism Mass Produced Code generator) o G 2 (Second Generation virus generator) o MPCGEN (Mass Produced Code GENerator) o NGVCK (Next Generation Virus Construction Kit) o VCL 32 (Virus Creation Lab for Win 32) Hunting for Metamorphic Engines 7
Virus Construction Kits q Did not consider Meta. PHOR o Difficult to work with, finicky q All of these claim to be metamorphic q Are they really? o How can we measure “metamorphism”? q If they are highly metamorphic, can we still detect them? Hunting for Metamorphic Engines 8
Brief Review of Malware Detection q First generation o Signature scanning, wildcards OK q Second generation o Approximate signature scanning; e. g. , ignore NOP instructions q Code emulation q Heuristic analysis o Static or dynamic, false positives… Hunting for Metamorphic Engines 9
Machine Learning q Consider the following o Data Mining, Neural Networks, HMMs q Data Mining o Malware-related previous work o Generic approach q Neural Networks o Previous work based on byte trigrams o Developed and used at IBM Hunting for Metamorphic Engines 10
Hidden Markov Models q Train HMM on metamorphic family q Then we can score any file to see how “close” it is to the family q What to use to train such an HMM? o Raw bytes in exe? o Disassembled code? o Opcode sequence? q More on this later… Hunting for Metamorphic Engines 11
Software Similarity q How to quantify metamorphism? q In general, how to measure similarity of software? q Given program 1 and program 2. . q We develop a score o Score of 0 means “no similarity” o Score of 1 means “virtually identical” Hunting for Metamorphic Engines 12
N-gram Similarity q Given executable files X and Y q Extract opcode sequences from each o Suppose X has n opcodes o Suppose Y has m opcodes q How to compare the sequences? q Many possible ways --- here we use ngram analysis o That is, we compare subsequences Hunting for Metamorphic Engines 13
N-gram Similarity q Extracted opcode sequences o X=(x 0, x 1, …, xn-1) and Y=(y 0, y 1, …, ym-1) q Compare subsequences of length k o Then xi, xi+1, …, xi+k-1 matches yj, yj+1, …, yj+k-1 if they are the same in any order o For each such match, plot the point (i, j) o Remove any segments less than p points q Then score = (x axis covered + y axis covered) / 2 Hunting for Metamorphic Engines 14
N-gram Similarity Example Hunting for Metamorphic Engines 15
N-gram Similarity q Score is between 0 and 1 q If program X identical to program Y o Main diagonal is a solid line o And score = 1 q Minimum score is 0 q The smaller the score, the less similar are the programs Hunting for Metamorphic Engines 16
Typical N-gram Similarity q Normal (cygwin utility) files Hunting for Metamorphic Engines 17
Typical N-gram Similarity q NGVCK Hunting for Metamorphic Engines 18
Typical N-gram Similarity q G 2 Hunting for Metamorphic Engines 19
N-gram Similarity q Compare members of a “family” with each other Hunting for Metamorphic Engines 20
N-gram Similarity q In graphical form… Hunting for Metamorphic Engines 21
N-gram Similarity Conclusion? q G 2 more similar to each other than expected o So, they are not very metamorphic o Ditto for most of the other generators q But, NGVCK viruses more different from each other than expected o So, they are highly metamorphic q Implication wrt signature detection? Hunting for Metamorphic Engines 22
NGVCK Similarity q Compare NGVCK to other families… Hunting for Metamorphic Engines 23
NGVCK Similarity Conclusion? q NGVCK viruses very different from each other o Implies highly metamorphic… o …so, signature detection will fail q But NGVCK viruses are even more different from normal files o Then what about detection? Hunting for Metamorphic Engines 24
Aside: Similarity Measures to Consider? q Given opcode sequences o Edit distance o Other sequence comparison techniques o Statistical measures q Considering raw bytes o Statistical measures o Entropy and other “structural” measures Hunting for Metamorphic Engines 25
Hidden Markov Models q Generic view of HMM Hunting for Metamorphic Engines 26
HMM Notation Hunting for Metamorphic Engines 27
HMM for Metamorphic Detection q Train HMM o Extract opcodes from family executables o Append opcode sequences o Train a model, i. e. , determine matrices q Use trained HMM to score files o Given an file, extract opcode sequence o Score sequence against the model o Compare to predetermined threshold Hunting for Metamorphic Engines 28
HMM Scoring: Fine Points q Score computed as log likelihood of the scored sequence o Normalize as “log likelihood per opcode” o Why LLPO? q How to quantify effectiveness? o ROC curves are very useful o Specifically, area under ROC curve (AUC) Hunting for Metamorphic Engines 29
Results q HMM scoring for NGVCK family Hunting for Metamorphic Engines 30
HMM Scoring: Bottom Line q Signature detection for metamorphic families, except NGVCK q For NGVCK, we can use HMM o Classification is 100% when compared to normal (benign) files o Some misclassifications of other malware (is that good or bad? ) q Should include ROC curves, AUC, … Hunting for Metamorphic Engines 31
HMM States: 3 State Model Hunting for Metamorphic Engines 32
N-gram Score q Can also score files using N-grams q Randomly select NGVCK file o Extract its opcode sequence q Given o o a file we want to score Extract its opcode sequence N-gram similarity to NGVCK sequence Higher similarity, classify as NGVCK Lower similarity, classify as “not NGVCK” Hunting for Metamorphic Engines 33
N-gram Score Results? q For NGVCK, obtain ideal separation o There exists a threshold for which… o …we can separate NGVCK from normal q Surprisingly strong results o For such a simple similarity score q Why does this work? o We come back to this at the end… Hunting for Metamorphic Engines 34
Compare to Commercial AV q Tested following on our virus sets o e. Trust, avast!, AVG q These scanners detected most of the viruses from weak families o That is, G 2, VCL 32, etc. q But none of the NGVCK viruses detected by any of the 3 scanners Hunting for Metamorphic Engines 35
Conclusion q HMM effective at detecting a highly metamorphic NGVCK malware family q N-gram similarity also effective q NGVCK not detected by commercial AV q So, this detection improves the state of the art q Practical considerations? Hunting for Metamorphic Engines 36
Lessons Learned? q Why can we detect NGVCK family? q In spite of high metamorphism, code is statistically different from normal q “Improved” metamorphic malware? q Metamorphism must be sufficient to evade signature detection q But, metamorphic family must be statistically similar to normal Hunting for Metamorphic Engines 37
Future Work q Build a better metamorphic generator o Some progress here, but still detectable using other detection methods o Still need better generators… q Develop and test other detection strategies o Lots of work done here too o But lots more to do Hunting for Metamorphic Engines 38
References q W. Wong and M. Stamp, Hunting for metamorphic engines, Journal in Computer Virology 2(3): 211 -229, 2006 q M. Stamp, A revealing introduction to hidden Markov models Hunting for Metamorphic Engines 39
- Migratory bird hunting and conservation stamp act
- Bird wing vs bat wing
- Left vs right politics
- Raymond chi-wing wong
- Serat wulangreh pupuh gambuh kaanggit dening
- Information retrieval slides
- Light vehicle diesel engines
- Engines link analysis and
- Chapter 5 principles of engine operation
- Search engines architecture
- Search engines information retrieval in practice
- Knowledge search engines
- Automotive engines 8th edition
- Slatten racing engines
- Search engine architecture
- Fuel saver plus
- Yarn engines
- Avaya identity engines
- Troubleshooting small engines
- Other search engines
- Search engines information retrieval in practice
- Search engines information retrieval in practice
- Meta search engines compared
- Www.sbu
- Aircraft engines
- Stream processing engines
- Engine layouts
- Sage game engine
- Siege engines
- Chapter 8 carburetion answers
- Why does pablo neruda urge to keep quiet
- Open source search engines
- Link analysis
- Siemens gas engines
- Harper industries has $900 million
- Different types of engines
- Siege engine
- Advantages and disadvantages of meta search engines
- Subpart zzzz flowchart
- Miss lawrence billie holiday