Phish Score Hacking Phishers Minds CNSM 2014 Fault

  • Slides: 19
Download presentation
Phish. Score: Hacking Phishers’ Minds CNSM 2014 – Fault Tolerance and Security Track November

Phish. Score: Hacking Phishers’ Minds CNSM 2014 – Fault Tolerance and Security Track November 18, 2014 Samuel Marchal, Jérôme François, Radu State and Thomas Engel {samuel. marchal, radu. state, thomas. engel}@uni. lu jerome. francois@inria. fr

Phish. Score at a glance Phish. Score: Hacking Phishers‘ Minds – Samuel Marchal 1

Phish. Score at a glance Phish. Score: Hacking Phishers‘ Minds – Samuel Marchal 1 / 16

What is Phishing ? • Use of technical subterfuges and social engineering to steal

What is Phishing ? • Use of technical subterfuges and social engineering to steal any kind of valuable consumers’ data: • Identity information • Web-sites credentials: login, password, etc. • Credit card information • Etc. • Cause billions of dollars of loss every year Phish. Score: Hacking Phishers‘ Minds – Samuel Marchal 2 / 16

Phishing techniques and statistics • Web based delivery • Trojan hosts • Content Injection

Phishing techniques and statistics • Web based delivery • Trojan hosts • Content Injection (website) • Phishing emails • Instant messaging • Fake websites • etc. Phish. Score: Hacking Phishers‘ Minds – Samuel Marchal 3 / 16

Phishing website example Phish. Score: Hacking Phishers‘ Minds – Samuel Marchal 4 / 16

Phishing website example Phish. Score: Hacking Phishers‘ Minds – Samuel Marchal 4 / 16

Phishing URLs characteristics www. paypal. creasconsultores. com/www. paypal. com/Resolutioncenter. php shevkun. org/css/paypal. com/cgi-bin/cmd%3 D_login-submit/css/websc.

Phishing URLs characteristics www. paypal. creasconsultores. com/www. paypal. com/Resolutioncenter. php shevkun. org/css/paypal. com/cgi-bin/cmd%3 D_login-submit/css/websc. php us-mg 6. mail. yahoo. com. dwarkamaigroup. com/Yahoo. html emailoans. hostingventure. com. au/bankofamerica. com nitkowski. pl/components/wellsfargo/questions. php URL characteristics: • Long URLs (many level domains, long path, etc. ) • Composed of many labels • Embed targeted brand at different URL level e. g. Yahoo, Wells Fargo • Embed specific key words Phish. Score: Hacking Phishers‘ Minds – Samuel Marchal 5 / 16

Prior Work URL lexical analysis • Garrera et al. [WORM `07] Logistic regression with

Prior Work URL lexical analysis • Garrera et al. [WORM `07] Logistic regression with word based features • Ma et al. [SIGKDD `09] Batch classification method with lexical and host based features • Blum et al. [AISec `10] Refined technique with binary feature for each word/level • Le et al. [Infocom `11] Batch and online learning with lexical features and URL features Phish. Score: Hacking Phishers‘ Minds – Samuel Marchal 6 / 16

Phishing URLs characteristics www. paypal. creasconsultores. com/www. paypal. com/Resolutioncenter. php shevkun. org/css/paypal. com/cgi-bin/cmd%3 D_login-submit/css/websc.

Phishing URLs characteristics www. paypal. creasconsultores. com/www. paypal. com/Resolutioncenter. php shevkun. org/css/paypal. com/cgi-bin/cmd%3 D_login-submit/css/websc. php us-mg 6. mail. yahoo. com. dwarkamaigroup. com/Yahoo. html emailoans. hostingventure. com. au/bankofamerica. com nitkowski. pl/components/wellsfargo/questions. php The registered domain has no relationship with the rest of the URL http: // 4 ld. 3 ld. mld. ps /path 1/path 2? key 1=value 1&key 2=value 2 • Most parts of URLs can be freely defined • Except the registered domain: main level domain + public suffix Phish. Score: Hacking Phishers‘ Minds – Samuel Marchal 7 / 16

Proposition for Phishing URL Detection Hypothesis: • Components of legitimate URLs are all related

Proposition for Phishing URL Detection Hypothesis: • Components of legitimate URLs are all related • Registered domains (mld. ps) of phishing URLs are not related to the remaining of the URL Analyse relatedness between mld. ps and the remaining part of a URL : Intra-URL relatedness Phish. Score: Hacking Phishers‘ Minds – Samuel Marchal 8 / 16

Intra-URL relatedness URL label extraction: http: //4 ld. 3 ld. mld. ps/path 1/path 2?

Intra-URL relatedness URL label extraction: http: //4 ld. 3 ld. mld. ps/path 1/path 2? key 1=value 1&key 2=value 2 “mld” & “mld. ps” Basic splitting login. paypal. com/securepayment • RDurl = {paypal; paypal. com} • REMurl = {login; secure; payment} Phish. Score: Hacking Phishers‘ Minds – Samuel Marchal 9 / 16

Intra-URL relatedness evaluation How to evaluate intra-URL relatedness ? • Compare the two sets

Intra-URL relatedness evaluation How to evaluate intra-URL relatedness ? • Compare the two sets RDurl and REMurl • Existing word relatedness techniques : Wordnet [Miller 90], NGD [Cilibrasi 07], Disco [Kolb 08], etc. Problem: all dictionary based and ”Internet” vocabulary is not necessarily contained in dictionary • Idea : use Search Engine Query Data • Web searches reflect the cognitive behaviour of users looking for services on Internet (what phishers try to identify and to mimic) • Request well-known services: Google Trends & Yahoo Clues • See which words are requested together in search engines to infer word relatedness Phish. Score: Hacking Phishers‘ Minds – Samuel Marchal 10 / 16

Intra-URL relatedness evaluation Phish. Score: Hacking Phishers‘ Minds – Samuel Marchal 11 / 16

Intra-URL relatedness evaluation Phish. Score: Hacking Phishers‘ Minds – Samuel Marchal 11 / 16

Features set 12 features representing intra-URL relatedness: Word set relatedness (Jaccard index) JRR JAR

Features set 12 features representing intra-URL relatedness: Word set relatedness (Jaccard index) JRR JAR JRA JARrd JAA JARrem Words embedded in URL cardrem Popularity of registered domain Popularity of words in URL ratio. Arem ratio. Rrem Phish. Score: Hacking Phishers‘ Minds – Samuel Marchal mldres mld. psres ranking 12 / 16

Feature analysis • Datasets: • 48, 009 phishing URLs (source: Phish. Tank) • 48,

Feature analysis • Datasets: • 48, 009 phishing URLs (source: Phish. Tank) • 48, 009 legitimate URLs (source DMOZ) • Features extraction for all dataset Phish. Score: Hacking Phishers‘ Minds – Samuel Marchal 13 / 16

URL classification • Machine learning approach: • Determine the best classifier to identify phishing

URL classification • Machine learning approach: • Determine the best classifier to identify phishing URLs • 7 classifiers tested: Random Forest, C 4. 5, JRip, SVM, etc. • 10 -fold cross-validation on the presented feature set (96, 016 URLs) • Random Forest: 94. 91% accuracy 1. 44% FPrate Phish. Score: Hacking Phishers‘ Minds – Samuel Marchal 14 / 16

URL rating • Random Forest based rating system: • Use soft prediction score [0;

URL rating • Random Forest based rating system: • Use soft prediction score [0; 1] as URL score: • 1: phishing URL • 0: legitimate URL • 0: 22, 863 legitimate // 40 phishing • 1: 26 legitimate // 34, 790 phishing 99. 89% correctness on 60. 11% of the dataset • [0; 0. 1] and [0. 9; 1] 99. 22% correctness on 83. 97% of the dataset Phish. Score: Hacking Phishers‘ Minds – Samuel Marchal 15 / 16

Conclusion Phish. Score Lexical analysis to detect phishing URLs: • Intra-URL relatedness • Word

Conclusion Phish. Score Lexical analysis to detect phishing URLs: • Intra-URL relatedness • Word relatedness inferred with search engine query data • Phishing URL detection: 95% accuracy (FP rate = 1. 44%) • URL rating system: >99% correctness for > 80% URLs Future Work: • Use distributed on-line processing (Big Data) to reduce delay • Implementation as phishing email filtering and browser add-on URL Semantic Analysis for Phishing Detection – Samuel Marchal 16 / 16

Phish. Score: Hacking Phishers’ Minds CNSM 2014 – Fault Tolerance and Security Track November

Phish. Score: Hacking Phishers’ Minds CNSM 2014 – Fault Tolerance and Security Track November 18, 2014 Samuel Marchal, Jérôme François, Radu State and Thomas Engel {samuel. marchal, radu. state, thomas. engel}@uni. lu jerome. francois@inria. fr

Phishing summary • Phishing: • seeks to steal different kind of data • targets

Phishing summary • Phishing: • seeks to steal different kind of data • targets several industry sector • uses various techniques Is there a global characteric for phishing ? No, but most of phishing attacks rely on fake websites using redirecting links Phishing detection technique with wide scope: Phishing URL identification Phish. Score: Hacking Phishers‘ Minds – Samuel Marchal 5 / 17