OfftheHook An Efficient and Usable ClientSide Phishing Prevention
Off-the-Hook: An Efficient and Usable Client-Side Phishing Prevention Application January 31 st, 2017 Samuel Marchal*, Giovanni Armano*, Kalle Saari*, Tommi Gröndahl*, Nidhi Singh†, N. Asokan* *Aalto University - †Intel Security samuel. marchal@aalto. fi
Requirements for phishing detection • Accuracy: high detection rate with low misidentification of legitimate webpages as phish. • Context independent detection: not dependent on any observed language or brand. • Temporal resilience: accuracy does not degrade overtime. • Resilience to dynamic phish: different content can be delivered to different user • User privacy: no disclosure of browsing history • Effective protection: fast decision and effective warning 2
Client-side implementation Decision relies only on information available to a web browser: • Privacy preservation • Resilient to dynamic phish • • • Starting URL Landing URL Redirection chain Logged links HTML source code: – – Text Title HREF links Copyright 3
Modeling phisher limitations Phishers have different level of control and are placed under some constraints while building a webpage: • Control: External loaded content (logged links) and external HREF links are not controlled by page owner. • Constraints: Registered domain name part of URL cannot be freely defined: constrained by registration (DNS) policies. • Accurate decision • Temporal resilience 4
Use few but dynamic features • 210 dynamic features computed from data sources: – – – URL features (106) Term usage consistency (66) Usage of starting and landing mld (22) RDN usage (13) Webpage content (5) • Gradient Boosting classification (supervised) • Context independent decision • Fast decision 5
Relevant warnings Redirection to the target of the phish / no technical jargon 6
System Accuracy (language independence) • Classifier Training: – 4, 531 English legitimate webpages – 1, 036 phishing webpages • Assessment: – Legitimate webpages: • 100, 000 English • 10, 000 each in French, German, Italian, Portuguese and Spanish – 1, 216 phishing webpages 7
System Accuracy (language independence) ROC Curve 100, 000 English legitimate / 1, 216 phishs (≈ real world repartition) Precision vs. Recall Precision Recall FP Rate AUC Accuracy 0. 956 0. 958 0. 0005 0. 999 8
Accuracy comparison FPR Precision Recall Accuracy Cantina (CMU) 0. 03 0. 212 0. 89 0. 969 Cantina+ (CMU) 0. 013 0. 964 0. 955 0. 97 Ma et al. (UCB) 0. 001 0. 998 0. 924 0. 955 Whittaker et al. (Google) 0. 0001 0. 989 0. 915 0. 999 Monarch (UCB) 0. 003 0. 961 0. 734 0. 866 Our method 0. 0005 0. 956 0. 958 0. 999 9
Performance • Memory footprint – 295 MB • Impact on Web surfing – Phishing webpages: • Interaction blocked in < 0. 2 second • Warning displayed (and target identified) in < 2 seconds – Legitimate webpages: • None (albeit false positives) 10
Thank You https: //ssg. aalto. fi/projects/phishing/ 11
Off-the-Hook: An Efficient and Usable Client-Side Phishing Prevention Application January 31 st, 2017 Samuel Marchal*, Giovanni Armano*, Kalle Saari*, Tommi Gröndahl*, Nidhi Singh†, N. Asokan* *Aalto University - †Intel Security samuel. marchal@aalto. fi
- Slides: 12