Hamsa Fast Signature Generation for Zeroday Polymorphic Worms

Desired Requirements for Polymorphic Worm Signature Generation • Network-based signature generation – Worms spread

Desired Requirements for Polymorphic Worm Signature Generation • Noise tolerant – Most network flow

Outline • • • Motivation Hamsa Design Model-based Signature Generation Evaluation Related Work Conclusion

Choice of Signatures • Two classes of signatures – Content based » Token: a

Unique Invariants of Worms • Protocol Frame Invariants – The code path to the

Hamsa Design • Key idea: model the uniqueness of worm invariants – Greedy algorithm

Hamsa Signature Generator • Core part: Model-based Greedy Signature Generation • Iterative approach for

Problem Formulation Maximize the coverage in the suspicious pool Signature Generator Normal pool false

Model Uniqueness of Invariants t 1 U(1)=upper bound of FP(t 1) FP 21% t

Signature Generation Algorithm token extraction Suspicious pool FP) (COV, t 1 u(1)=15% tokens (82%,

Signature Generation Algorithm Signature t 1 (COV, FP) (82%, 50%) (69%, 9. 8%) (70%,

Algorithm Analysis • Runtime analysis O(T*(|M|+|N|)) • Provable Attack Resilience Guarantee – Analytically bound

Attack Resilience Assumptions • Two Common assumptions for any sig generation sys • Two

Improvements to the Basic Approach • Generalizing Signature Generation – use scoring function to

Experiment Methodology • Experiential setup: – Suspicious pool: » » – • Three pseudo

Results on Signature Quality Training FN Evaluation Binary Worms FN FP evaluation FP Signature

Speed Results • • – – – Implementation with C++/Python 500 samples with 20%

Related works Hamsa Polygraph CFG PADS Nemean COVERS Malware Detection Host Network or host

Conclusion • Network based signature generation and matching are important and challenging • Hamsa:

Slides: 26

Download presentation

Hamsa: Fast Signature Generation for Zero-day Polymorphic Worms with Provable Attack Resilience Zhichun Li, Manan Sanghi, Yan Chen, Ming-Yang Kao and Brian Chavez Lab for Internet & Security Technology (LIST) Northwestern University

The Spread of Sapphire/Slammer Worms 2

Desired Requirements for Polymorphic Worm Signature Generation • Network-based signature generation – Worms spread in exponential speed, to detect them in their early stage is very crucial… However » At their early stage there are limited worm samples. – The high speed network router may see more worm samples… But » Need to keep up with the network speed ! » Only can use network level information 3

Desired Requirements for Polymorphic Worm Signature Generation • Noise tolerant – Most network flow classifiers suffer false positives. – Even host based approaches can be injected with noise. • Attack resilience – Attackers always try to evade the detection systems • Efficient signature matching for high-speed links No existing work satisfies these requirements 4 !

Outline • • • Motivation Hamsa Design Model-based Signature Generation Evaluation Related Work Conclusion 5

Choice of Signatures • Two classes of signatures – Content based » Token: a substring with reasonable coverage to the suspicious traffic » Signatures: conjunction of tokens – Behavior based • Our choice: content based – Fast signature matching. ASIC based approach can archive 6 ~ 8 Gb/s – Generic, independent of any protocol or server 6

Unique Invariants of Worms • Protocol Frame Invariants – The code path to the vulnerability part, usually infrequently used – Code-Red II: ‘. ida? ’ or ‘. idq? ’ • Control Data: leading to control flow hijacking – Hard coded value to overwrite a jump target or a function call • Worm Executable Payload – CLET polymorphic engine: ‘ 0x 8 b’, ‘xffxff’ and ‘tx 07xeb’ • Possible to have worms with no such invariants, but very hard 7

Hamsa Architecture 8

Hamsa Design • Key idea: model the uniqueness of worm invariants – Greedy algorithm for finding token conjunction signatures • Highly accurate while much faster – Both analytically and experimentally – Compared with the latest work, polygraph – Suffix array based token extraction • Provable attack resilience guarantee • Noise tolerant 9

Hamsa Signature Generator • Core part: Model-based Greedy Signature Generation • Iterative approach for multiple worms 10

Outline • • • Motivation Hamsa Design Model-based Signature Generation Evaluation Related Work Conclusion 11

Problem Formulation Maximize the coverage in the suspicious pool Signature Generator Normal pool false positive bound r With noise NP-Hard! Signature False positive in the normal pool is bounded by r 12

Model Uniqueness of Invariants t 1 U(1)=upper bound of FP(t 1) FP 21% t 2 U(2)=upper bound of FP(t 1, t 2) Joint FP with t 1 2% 9% 0. 5% 17% 1% 5% The total number of tokens bounded by k* 13

Signature Generation Algorithm token extraction Suspicious pool FP) (COV, t 1 u(1)=15% tokens (82%, 50%) (70%, 11%) (67%, 30%) (62%, 15%) (50%, 25%) (41%, 55%) (36%, 41%) (12%, 9%) Order by coverage 14

Signature Generation Algorithm Signature t 1 (COV, FP) (82%, 50%) (69%, 9. 8%) (70%, 11%) (68%, 8. 5%) (67%, 30%) (67%, 1%) (62%, 15%) (40%, 2. 5%) (50%, 25%) (35%, 12%) (41%, 55%) (31%, 9%) (36%, 41%) (10%, 0. 5%) t 2 u(2)=7. 5% (12%, 9%) Order by joint coverage with t 1 15

Algorithm Analysis • Runtime analysis O(T*(|M|+|N|)) • Provable Attack Resilience Guarantee – Analytically bound the worst attackers can do! – Example: K*=5, u(1)=0. 2, u(2)=0. 08, u(3)=0. 04, u(4)=0. 02, u(5)=0. 01 and r=0. 01 Noise ratio 5% 10% 20% FP upper bound 1% 1% 1% FN upper bound 1. 84% 3. 89% 8. 75% – The better the flow classifier, the lower are the false negatives 16

Attack Resilience Assumptions • Two Common assumptions for any sig generation sys • Two Unique assumptions for token-based schemes • Attacks to the flow classifier – Our approach does not depend on perfect flow classifiers – With 99% noise, no approach can work! – High noise injection makes the worm propagate less efficiently. • Enhance flow classifiers 17

Improvements to the Basic Approach • Generalizing Signature Generation – use scoring function to evaluate the goodness of signature • Iteratively use single worm detector to detect multiple worms – At the first iteration, the algorithm find the signature for the most popular worms in the suspicious pool. – All other worms and normal traffic treat as noise. 18

Outline • • • Motivation Hamsa Design Model-based Signature Generation Evaluation Related Work Conclusion 19

Experiment Methodology • Experiential setup: – Suspicious pool: » » – • Three pseudo polymorphic worms based on real exploits (Code -Red II, Apache-Knacker and ATPhttpd), Two polymorphic engines from Internet (CLET and TAPi. ON). Normal pool: 2 hour departmental http trace (326 MB) Signature evaluation: – – False negative: 5000 generated worm samples per worm False positive: » » » 4 -day departmental http trace (12. 6 GB) 3. 7 GB web crawling including. mp 3, . rm, . ppt, . pdf, . swf etc. /usr/bin of Linux Fedora Core 4 20

Results on Signature Quality Training FN Evaluation Binary Worms FN FP evaluation FP Signature Code-Red II 0 0 0 {'. ida? ': 1, '%u 780': 1, ' HTTP/1. 0rn': 1, 'GET /': 1, '%u': 2} CLET 0 0. 109% 0 0. 06236% 0. 268% {'0x 8 b': 1, 'xffxff': 1, 'tx 07xeb': 1} • Single worm with noise – – • Training FP Suspicious pool size: 100 and 200 samples Noise ratio: 0%, 10%, 30%, 50%, 70% Noise samples randomly picked from the normal pool Always get above signatures and accuracy. Multiple worms with noises give similar results 21

Speed Results • • – – – Implementation with C++/Python 500 samples with 20% noise, 100 MB normal traffic pool, 15 seconds on an XEON 2. 8 Ghz, 112 MB memory consumption Speed comparison with Polygraph Asymptotic runtime: O(T) vs. O(|M|2), when |M| increase, T won’t increase as fast as |M|! Experimental: 64 to 361 times faster (polygraph vs. ours, both in python) 22

Outline • • • Motivation Hamsa Design Model-based Signature Generation Evaluation Related Work Conclusion 23

Related works Hamsa Polygraph CFG PADS Nemean COVERS Malware Detection Host Network or host based Network Host Content or behavior based Content based Behavior Content based Behavior based Noise tolerance Yes (slow) Yes No No Yes Multi worms Yes in one protocol Yes (slow) Yes No Yes Yes On-line sig matching Fast Slow Fast Slow Generality General purpose Protocol specific Server specific General purpose Provable atk Yes resilience No No No Information egp exploited egp p egp e eg p 24

Conclusion • Network based signature generation and matching are important and challenging • Hamsa: automated signature generation – – Fast Noise tolerant Provable attack resilience Capable of detecting multiple worms in a single application protocol • Proposed a model to describe the worm invariants 25

Questions ?