Needle in a Haystack Tracking Down Elite Phishing

  • Slides: 30
Download presentation
Needle in a Haystack: Tracking Down Elite Phishing Domains in the Wild Ke Tian,

Needle in a Haystack: Tracking Down Elite Phishing Domains in the Wild Ke Tian, Steve T. K. Jan, Hang Hu, Danfeng Yao, Gang Wang Computer Science, Virginia Tech

Phishing is a Big Threat • Phishing: fraudulent attempt to obtain credentials (password) •

Phishing is a Big Threat • Phishing: fraudulent attempt to obtain credentials (password) • Big Threat: estimated $30 M loss in 20171 Yahoo Data Breach in 2014 Affected 500 Million Yahoo! User Account Ubiquiti Networks Lost $46. 7 M dollar to scammers in 2015 • Exploiting human factor is easier than system vulnerabilities. 1. Internet Crime Report, FBI, 2017. 2

Some Phishing Websites are Easy to Tell • Phishing is a long existing problem

Some Phishing Websites are Easy to Tell • Phishing is a long existing problem • Good news: some phishing websites are easy to detect http: //178. 128. 85. 7/banks/National http: //account-updates-center-service. beedoces. com. br URL not relate to Paypal: Phishing 3

Some Phishing Websites are Easy to Tell • Phishing is a long existing problem

Some Phishing Websites are Easy to Tell • Phishing is a long existing problem • Good news: some phishing websites are easy to detect http: //account-updates-center-service. beedoces. com. br http: //178. 128. 85. 7/banks/National URL not include domain name: Phishing 4

More Sophisticated Phishing Example http: //www. apple. com • This is IDN (Internalized Domain

More Sophisticated Phishing Example http: //www. apple. com • This is IDN (Internalized Domain Name) homograph attack • Homograph domain squatting: Exploit the fact that many different characters look alike Different Char 5

More Sophisticated Phishing Example http: //get. adoḅe. com/es/flashplayer http: //www. apple. com Different Char

More Sophisticated Phishing Example http: //get. adoḅe. com/es/flashplayer http: //www. apple. com Different Char • This is IDN (Internalized Domain Name) homograph attack • Homograph domain squatting: Exploit the fact that many different characters look alike 6

How can we systematically capture these sophisticated phishing websites in practice? 7

How can we systematically capture these sophisticated phishing websites in practice? 7

This Study • We focus on squatting phishing domains • Web contents: phishing content,

This Study • We focus on squatting phishing domains • Web contents: phishing content, mimicking real websites • Domain name: “squatting” domain that impersonates popular brands • Research questions • How to systematically detect squatting phishing domains in practice? • What types of impersonation/evasion techniques do they use? • How effective are existing blacklists to detect them? • Large-scale empirical measurements • Search over 224 million DNS records • 702 popular brands 8

Outline • Introduction • Detection methodology • Detect squatting domain • Detect phishing pages

Outline • Introduction • Detection methodology • Detect squatting domain • Detect phishing pages under squatting domain • Measuring squatting-based phishing • Conclusion 9

Detection Methodology • Our detection methodology based on a series of filtering process DNS

Detection Methodology • Our detection methodology based on a series of filtering process DNS Records: 224, 810, 532 Squatting Domains: 657, 663 Phishing: 1, 741 Confirmed: Web 857 Mobile: 908 Popular brands: 702 Squatting domain detection Phishing classifier Manually check 10

Detect Squatting Domain • Goal: Detect squatting domain that impersonate brands • Given a

Detect Squatting Domain • Goal: Detect squatting domain that impersonate brands • Given a brand, search squatting domains in DNS facebook. com • Capture five types of squatting domains 1. Homograph: Look similar to target domain faceb 00 k. com facebook. com 2. Bits: Flip a bit of target domain facebnok. com 3. Typo: Mimic the incorrectly typed of target domain fcaebook. com 4. Combo: Connect target domain with other strings 5. Wrong. TLD: Different TLD of target domain facebook-stroty. com facebook. audi 11

Detect Squatting Domain • 224, 810, 532 DNS records 657, 663 squatting domains •

Detect Squatting Domain • 224, 810, 532 DNS records 657, 663 squatting domains • Crawl web and mobile version of pages that are still alive • Dynamic crawler: It can load java scripts and process redirections • 6, 115 squatting domains (1. 7%) are redirected to original brand • Some business purchase squatting domains to protect their own customers Squattting Domain pricelin. com Original Brand Re-direct priceline. com 12

Phishing Classifier • Goal: Classifying phishing pages under squatting domains • Ground Truth Data:

Phishing Classifier • Goal: Classifying phishing pages under squatting domains • Ground Truth Data: • 1, 731 phishing pages from Phish. Tank (manually confirm) • 1, 565 benign pages from squatting domain (manually confirm) • Our classifier is motivated by observations on evasion techniques: 1. Layout obfuscation 2. String obfuscation 3. Code obfuscation 13

Layout Obfuscation • Change style/color/layout of target brand website • Evade screenshot-similarity based detection

Layout Obfuscation • Change style/color/layout of target brand website • Evade screenshot-similarity based detection method Target Brand Phishing Website Be detected by existing methods Not be detected by existing methods 14

String/Code Obfuscation • Hide important text and keywords in the HTML source code •

String/Code Obfuscation • Hide important text and keywords in the HTML source code • Evade keyword-similarly based, or source code similarly based detection Target Brand HTML Phishing HTML <title> Log in to your Pay. Pal </title> <title> Log in to your Pay. Pa 1 </title> <title> Log in to your Pay. Pal </title> String Obfuscation Be detected by keywordsimilarly based methods <script> String. from. Char. Code(50) + “a” + …. Code Obfuscation 15

Our Design • Intuition 1: Phishing pages will be visually displayed to users •

Our Design • Intuition 1: Phishing pages will be visually displayed to users • Extract keywords from their screenshots with OCR • Tesseract OCR: extract keywords from image Google OCR Keyword list: Paypol Email passward …… NLTK spell check Keyword list: Paypal Email password …… 16

Our Design Cont. • Intuition 2: Phishing pages contain forms to collect user credentials

Our Design Cont. • Intuition 2: Phishing pages contain forms to collect user credentials • Extract keywords from HTML forms • Using text-based feature from the source code as compliment 17

Ground Truth Evaluation • Feed features to machine learning classifiers • Image (OCR) features,

Ground Truth Evaluation • Feed features to machine learning classifiers • Image (OCR) features, form features, text-based features • Naive Bayes, KNN and Random forest • Results of 10 -fold cross-validation: Classifier False Positive False Negative Naïve. Bayes 0. 5 0. 05 KNN 0. 04 0. 1 Random Forest 0. 03 0. 06 Random Forest is highly accurate AUC 0. 64 0. 92 0. 97 18

Outline • Introduction • Detection methodology • Detect squatting domain • Detect phishing pages

Outline • Introduction • Detection methodology • Detect squatting domain • Detect phishing pages under squatting domain • Measuring squatting-based phishing • Conclusion 19

Detection in Practice DNS Records: 224, 810, 532, Popular brands: 702 Squatting domains: 657,

Detection in Practice DNS Records: 224, 810, 532, Popular brands: 702 Squatting domains: 657, 663 Detected Phishing pages: 1, 741 Phishing on Mobile Confirmed phishing pages Web only: Confirmed phishing pages Mobile only: and Web: on mobile: 908 318 on web: 857 on both: 1175 267 590 Squatting phishing websites indeed exist More phishing websites on mobile 20

Can Current Blacklists Detect Them? • Run 70+ phishing blacklists, including Phish. Tank, e.

Can Current Blacklists Detect Them? • Run 70+ phishing blacklists, including Phish. Tank, e. Crime. X, Virus. Total # of Pages 1200 1000 Over 90 % live over a month 800 600 400 200 Reported them Existing blacklists/tools are not capable to capture squatting phishing yet 0 Phish. Tank Virus. Total e. Crime. X Evaded Blacklists 21

Squatting Domains Types • Combo squatting domains contain the largest number of phishing pages

Squatting Domains Types • Combo squatting domains contain the largest number of phishing pages • Bits and homograph squatting domains: Hard to register # of pages 600 500 400 300 200 100 0 Web Homograph Mobile Bits Typo Combo Wrong. TLD 22

Example Study: Uber • Attackers steal Uber truck driver’s account. Squatting Domain go-uberfreight. com

Example Study: Uber • Attackers steal Uber truck driver’s account. Squatting Domain go-uberfreight. com Target Domain freight. uber. com 23

Example Study: Office 365 • Attackers compromises users’ office 365 account Squatting Domain outlook-office

Example Study: Office 365 • Attackers compromises users’ office 365 account Squatting Domain outlook-office 365. net Target Domain office 365. com 24

Conclusion • An extensive measurement of squatting phishing domain • From 224, 810, 532

Conclusion • An extensive measurement of squatting phishing domain • From 224, 810, 532 DNS records and 700+ brands • Detect and identify 1, 175 squatting phishing pages • Open-sourced our tool at: https: //github. com/Squat. Phish • Future work • Adversarial attacks for OCR-based phishing detection • Deploy the system for long term measurement 25

Thank You 26

Thank You 26

APPENDIX 27

APPENDIX 27

Evasions in Squatting Phishing • Layout obfuscation: average 28. 5 hamming distance • String

Evasions in Squatting Phishing • Layout obfuscation: average 28. 5 hamming distance • String obfuscation: 68% adopted • Code obfuscation: 35% adopted Obfuscation is common to squatting phishing. 28

IP Location • Check geolocation of 1, 021 IP addresses, hosted in 53 different

IP Location • Check geolocation of 1, 021 IP addresses, hosted in 53 different countries. • U. S. has most of the sites, then Germany 29

False Positive Prediction http: //paypal. me 30

False Positive Prediction http: //paypal. me 30