RealTime RATbased APT Detection Our Focus Initial Compromise

Real-Time RAT-based APT Detection

Our Focus Initial Compromise Gaining Foothold Lateral Movement High Value Asset Acquisition Malware (e. g. RAT) Phishing Attacker Malicious Web n k sca or Netw Exploit vulnerability Victim Malw prop are agat ion Exploit browser Code Repo I F T N E L O D A C Database Behavior based Malware detection Behavior-based Malware Detection Provenance based Analytics § Design a detection mechanism that targets at the key step (gaining foothold) in the APT life-cycle 2

APT Malware v Remote Access Trojan (RAT) n Based on the study of 300+ APT whitepapers, RAT is a core component in an APT attack, and >90% are Windows based. n Allows an adversary to remotely control a system n A complex set of potentially harmful functions (PHFs) n E. g. , keylogger, screengrab, remote desktop, remote shell, audiograb n A Windows RAT typically embodies 10~40 PHFs. 3

Analysis of Engagement 1 Data 4

Issues with FAROS Kafka Topics v Kafka A and Kafka B not usable n Due to the unstable FAROS tool, TA 5. 1 suggests not consuming either Kafka A or Kafka B produced by FAROS v Stretch Goal Topic became available very late n Data errors found and FAROS re-produced the topic on 10/7 n Even with those issues, finally we finished our ingestion of the topic, submitted the initial report to TA 5. 1, and received positive feedback. 5

How to Figure Out the Attack Graph v Data Reduction n 71 M records in Stretch topic; 30 mins processing time n 529 processes in total; 22 processes (4%) identified involved in malware activities v Three processes were reported by our RAT detector n Profile. exe (2) matched with the remoteshell signature n Prodat. exe matched with the screengrab signature v Perform backtracking n Based on the artifacts (network ip/port connected, files created) and the pid-ppid relationship, we identify all relevant processes. 6

Attack Graph for FAROS Stretch Goal Dataset 7

Breakdown of the Attack (1) n The attack begins with triggering an executable "C: UsersUserDownloadsprofile. exe" at Sep. 27 18: 12: 06 GMT. 8

Breakdown of the Attack (2) n At 18: 13: 33, the malware "profile. exe" invoked "cmd. exe“, which in turn invoked another malware "C: UsersUserDownloadsprodat. exe" at 18: 13: 58. However, the current data traces do not allow us to determine how the malware gained foothold. n This malware mainly did screengrab, and saved the results in "C: UsersUserDownloadsproout. png". n And then this file was read and sent out by profile. exe to 129. 55. 12. 167: 19985. 9

Breakdown of the Attack (3) n At 18: 16: 58, the malware "profile. exe" invoked cmd. exe again to run hostname. exe, whoami. exe, and netstat. exe to collect sensitive information. The results were written to a log file "C: WindowsTemp1283. log 10

Breakdown of the Attack (3) – Cont’d n At 18: 19: 44, the malware "profile. exe" invoked cmd. exe" again, which in turn executed the malware “proup. exe“ n "proup. exe" then sent "1283. log" and initiated TCP connection to the attacker machine 129. 55. 12. 167: 1050 for data exfiltration. n At 18: 21: 42, “burnout. bat“ was executed for the cleanup. 11

Breakdown of the Attack (4) n At 20: 03: 55, a Firefox process ("firefox. exe") was launched, which invoked another Firefox process subsequently. Then the latter Firefox was probably compromised, which downloaded another malicious executable with the same name "profile. exe" from IP address 200. 10: 20480 (site la riat. world. net) at 20: 06: 41, and also saved it as "C: UsersUserDownloadsprofile. exe" 12

Breakdown of the Attack (4) – Cont’d n The malware then started running at 20: 09: 02 and soon invoked cmd. exe n The cmd. executed both "systeminfo. exe" and "tasklist. exe" to collect system information and currently running task list. n The results were saved in the file named "rfeed. dat". n Then "profile. exe" sent the data file out to 129. 55. 12. 167: 19985. n Finally, at 20: 17: 33, "profile. exe" executed "burnout. bat“ to perform the cleanup work. 13

Our Approach: Fine-Grained, Evasion-Resilient and Real-time RAT Detection 14

Our Work v What is going on n Implement a fine-grained, evasion-resilient and real-time detection system of RATs n Specifically, we detect if malicious functionalities are present in the system call traces of a process. v Why not provenance-based causality analysis n FAROS does not provide usable provenance information for now. n Data missing: provenance node, netflow object node, file object node. v What is next: n Design a system for both real-time APT malware detection and automatic causality analysis. 15

Overview v Observation n # of PHFs possibly embodied in a RAT is limited (10~40). n Core system calls and their orders required to exactly define a PHF are limited, and thus it is possible to identify all of them. v Core Idea n Fine-grained, evasion-resilient and real-time RAT detection n Determine if a program is a RAT by detecting its functionalities and examining its characteristics. Specifically, n Create signatures for each PHF possibly embodied in a RAT n Train a classifier based on the unique characteristics of RATs to discern between RATs and benign programs 16

Overview (Cont’d) v Advantages n Generated signatures are finer-grained and semantics-aware. n Identify what activity is going on while detecting a RAT n Hard to evade unless attackers find new ways of implementing PHFs and have to do that for at least several major PHFs 17

Supervised learning Training data with ground truth Our Approach Design PHF 1 Trace 1 PHF 1 Trace 2 … PHF 1 Trace n RAT traces Self-repeated gadgets identification and correlation analysis A B … PHFm. Trace 1 PHFm. Trace 2 … PHFm Trace n Benign traces System call traces Nt. Gdi. Create. Compatible. DC Nt. Gdi. Bit. Blt Nt. Create. Section Nt. Query. Information. Process Nt. Create. Thread Nt. Resume. Thread PHF 1 C … U Gadgets identification and correlation analysis V Feature generation & selection Supervised learning Characteristic analysis Signature matching … PHFm Signatures for each PHF, for determining the functionality W Classifier signatures for differentiating benign from malicious PHF 1 Sig Score 1 PHF 2 Sig Score 2 PHFn-1 Sig Score n-1 Classifier Sig Score n … Malicious Score Module 1: Traces based signature generation system (offline) Module 2: Real-time RAT detection system 18

PHF Signature Generation v Observation 1: n Most malicious activities such as keylogger and screengrab require frequent probes of input devices to collect coherent and meaningful user inputs. n Such characteristic is reflected in the trace that there exist small gadgets self-repeated multiple times. ⁞ Nt. User. Get. Keyboard. State Nt. User. Map. Virtual. Key. Ex Nt. User. Get. Foreground. Window ⁞ v Insight n Those gadgets can be automatically extracted from the traces and then potentially used for defining the malicious activities. 19

PHF Signature Generation – cont’d v Observation 2: n Multiple RATs tend to implement a PHF in the same way at the system call level. And the ways to implement a PHF are quite limited. v Insight: n n n Leverage sequence alignment algorithms borrowed from bioinformatics to identify regions of similarity in system call sequences. Such similarity regions typically correspond to the execution of similar code. Build finite automata to model the similarity regions as our signatures ⁞ Nt. Protect. Virtual. Memory Nt. Gdi. Create. Compatible. DC Nt. Gdi. Create. Compatible. Bitmap Nt. Gdi. Bit. Blt Nt. Gdi. Delete. Object. App Nt. Gdi. Ext. Get. Object. W Nt. Protect. Virtual. Memory ⁞ ⁞ Nt. Delay. Execution Nt. Gdi. Create. Compatible. DC Nt. Gdi. Create. DIBSection Nt. Gdi. Stretch. Blt Nt. Gdi. Delete. Object. App Nt. Gdi. Ext. Get. Object. W Nt. Delay. Execution ⁞ 20

Classifier Signature Generation v Selected features (also unique characteristics of RATs) n Persistence n n Modifies auto-execute functionality by setting/creating a value in the registry Environment Awareness for Reconnaissance and Evasion n Reads the active computer name, or the machine identifier “Machine. Guid” n Tries to evade analysis by sleeping many times and for a long time (>2 min) Spyware/Information Retrieval n Accesses potentially sensitive information from local browsers n Queries sensitive IE security settings Anti-Detection and Being Stealthy n Sets the process error mode to suppress error box n Checks for the presence of an antivirus engine 21

Classifier Signature Generation – cont’d v Selected features – cont’d n System Destruction n n Opens file with deletion access rights probably for cleanup after attack Unusual Characteristics n Spawns a lot of processes n Creates/touches files in windows system directory and registry Running in Background n No window, menu, or any visible components n No human interactions Actions initiated remotely, rather than initiated locally All those features can be observed in system call traces (either system call name or argument). 22

Classifier Signature Generation– cont’d v Training set and selected features n System call traces of RATs n System call traces of popular benign applications (Winscap, Skype, notepad, …) Features RAT traces (Poison Ivy, Pandora, Darkcomet, …) Benign traces (Winscp, Skype, notepad++, quicktime player, …) Persistence Reconnaissance & Evasive Spyware/Information Retrieval Anti-Detection/Stealthiness Classifiers for discerning between RATs & benign System Destruction Unusual characteristics Background running Remotely initiated 23

Previous Malware Detection Methods Fail RAT Detection 24

Previous Malware Detection Methods v Main idea of the state-of-the-art work n Identify security-sensitive syscalls (e. g. , network connections-related) n Use data dependency to connect more syscalls, and hence construct a path ending at one security-sensitive syscall n Use such a path as detection signature n E. g. , the graph represents the signature graph generated. And the red nodes denote security-relevant system calls. Then whenever a path like the blue one and the yellow one is matched, the system would report the unknown program as malware. 25

Previous Malware Detection Methods v Main problem 1: false positive n In the real world, RATs and benign programs share lots of similar behavior. n It is not reasonable to judge a program just based on a similar behavior (i. e. , a matched path) without awareness of the semantics corresponding to that path. Either the blue path or the yellow one could represent benign behavior! 26

Previous Malware Detection Methods v Main problem 2: evadable by RATs Trace 1: Nt. User. Get. DC Nt. Gdi. Get. Device. Caps … Nt. Connect. Port Nt. Request. Wait. Reply. Port ⁞ Trace 2: Nt. Gdi. Create. Compatible. DC Nt. Gdi. Bit. Blt … Nt. Create. Section Nt. Query. Information. Process Nt. Create. Thread Nt. Resume. Thread ⁞ Nt. Connect. Port Extract data dependency between system calls. Build a signature graph for each malware sample based on dependency. Nt. Request. W ait. Reply. Port Nt. Create. Section Nt. Resume. Thread The system calls marked in red will be ignored since they neither are security-sensitive syscalls nor have data dependency with security-sensitive syscalls. Nt. Query. Inform Nt. Create. Thread ation. Process … 27

Previous Malware Detection Methods v Main problem 2: evadable by RATs (cont’d) n n RATs often stay inactive for a long time before sending out the data already collected. That is, the data collection actions are not necessarily followed by security-relevant system calls corresponding to abnormal network connections. In this case, the data collection behavior will not be identified by the signatures generated based on the security-related syscalls. Thus, the previous approaches could be evaded. Actually, the ignored syscalls could exactly be generated by the data collection behavior E. g. , the ignored syscalls actually represent part of the screengrab behavior (the right graph) 28

Conclusion v We proposed a fine-grained, evasion-resilient and real-time RAT detection approach. v Our approach has been evaluated to work well in the engagement 1. 29