Disclosure to 2019 FIRST CTI Symposium London Copyrightc2019

  • Slides: 31
Download presentation
Disclosure to 2019 FIRST CTI Symposium London Copyright(c)2019 NTT Corp. All Rights Reserved. A

Disclosure to 2019 FIRST CTI Symposium London Copyright(c)2019 NTT Corp. All Rights Reserved. A Lightweight Markup Language for Graph-Structured Threat Sharing Mayo YAMASAKI NTT-CERT, NTT Secure Platform Labs Copyright© 2019 NTT Corp. All Rights Reserved.

$whoami n Mayo YAMASAKI @ Tokyo, Japan n Researcher at NTT R&D n Member

$whoami n Mayo YAMASAKI @ Tokyo, Japan n Researcher at NTT R&D n Member of NTT-CERT's OSINT Team n R&D Topics n Search System n Knowledge Extraction from Text with NLP/ML n Knowledge Representation for Threat Sharing www. ntt-cert. org Copyright© 2019 NTT Corp. All Rights Reserved. 1

Agenda n Background n Overview of Proposed Lightweight Markup Language n Demo n Capability

Agenda n Background n Overview of Proposed Lightweight Markup Language n Demo n Capability & Limitation n Current Implementation n Future Work Copyright© 2019 NTT Corp. All Rights Reserved. 2

Agenda n Background n Overview of Proposed Lightweight Markup Language n Demo n Capability

Agenda n Background n Overview of Proposed Lightweight Markup Language n Demo n Capability & Limitation n Current Implementation n Future Work Copyright© 2019 NTT Corp. All Rights Reserved. 3

Security Reporting in NTT-CERT n Publishing about 20 security (threat) reports every day n

Security Reporting in NTT-CERT n Publishing about 20 security (threat) reports every day n Creating these reports associated with STIX data n Using web-based internal collab system for security reporting STIX Data Security Report Io. Cs CVEs TTPs Attributions Copyright© 2019 NTT Corp. All Rights Reserved. 4

Problem in Operation n Difficulty of creating structured STIX data in our operations n

Problem in Operation n Difficulty of creating structured STIX data in our operations n Time constraints and/or inadequate analyst training n Shortage of structured data in operations[1][2] n Unstructured expression used by 60% practitioners n Lack of context such as TTPs in structured data How to create structured data more easily ? [1] "The Value of Threat Intelligence: The Second Annual Study of North American & United Kingdom Companies ", Anomali, 2017. [2] "Exploring the opportunities and limitations of current Threat Intelligence Platforms", ENISA, 2018. Copyright© 2019 NTT Corp. All Rights Reserved. 5

Our Approach n Text-based representation, not GUI n New language for STIX 2 compatible

Our Approach n Text-based representation, not GUI n New language for STIX 2 compatible graph data High Graph Expressiveness STIX System Language Proposal RDF&DOT Human Language Markdown Low Graph Expressiveness Copyright© 2019 NTT Corp. All Rights Reserved. 6

RDF; Resource Description Framework n Graph as set of tuples such as (subject, predicate,

RDF; Resource Description Framework n Graph as set of tuples such as (subject, predicate, object) n Multiple format including JSON, XML, N-Triples, Turtle, etc. RDF N-Triples <http: //example. com/ipv 4/192. 168. 0. 0> <http: //example. com/indicates> <http: //example. com/malware/Evil. Rat>. <http: //example. com/ipv 4/192. 168. 0. 0> <http: //example. com/indicates> <http: //example. com/malware/Evil. Trojan>. Graph Representation Evil. Rat(malware) Evil. Trojan(malware) indicates 192. 168. 0. 0(ioc-ipv 4) Copyright© 2019 NTT Corp. All Rights Reserved. 7

DOT Language n A DSL for a graph visualization tool such as Graphviz DOT

DOT Language n A DSL for a graph visualization tool such as Graphviz DOT Representation digraph G { a [label = “malware: Evil. Rat”]; b [label = “malware: Evil. Trojan”]; c [label = “ioc-ipv 4: 192. 168. 0. 0”]; c -> a [label=“indicates”]; c -> b [label=“indicates”]; } Graph Representation Evil. Rat(malware) Evil. Trojan(malware) indicates 192. 168. 0. 0(ioc-ipv 4) Copyright© 2019 NTT Corp. All Rights Reserved. 8

Agenda n Background n Proposed Lightweight Markup Language n Demo n Capability & Limitation

Agenda n Background n Proposed Lightweight Markup Language n Demo n Capability & Limitation n Current Implementation n Future Work Copyright© 2019 NTT Corp. All Rights Reserved. 9

Overview n Consists of pre-shared graph schema & graph data n Schema: domain specific

Overview n Consists of pre-shared graph schema & graph data n Schema: domain specific definition of edges between node types n Rarely updated like RDB-schema n Data(Report): plane-text with nodes as "[NAME]{TYPE}" n Edges automatically extracted according to the schema Sharing a report Team A Schema Report STIX Schema Team B Report STIX Copyright© 2019 NTT Corp. All Rights Reserved. 10

Example 1: IP with malware Schema: A table of directed relationships from rows to

Example 1: IP with malware Schema: A table of directed relationships from rows to cols Destination types malware ioc-ipv 4 Source types malware - - ioc-ipv 4 indicates - Report [Evil. Rat]{malware} sends system information to [192. 168. 0. 0]{ioc-ipv 4}. The IP address also had been used for C 2 server of [Evil. Trojan]{malware}. Edge types STIX compatible Graph Evil. Rat(malware) Evil. Trojan(malware) indicates 192. 168. 0. 0(ioc-ipv 4) Copyright© 2019 NTT Corp. All Rights Reserved. 11

Constraints 1. Any pair of node types must have 1 or 0 edge type.

Constraints 1. Any pair of node types must have 1 or 0 edge type. n To deterministically extract edges 2. All pair of nodes in a paragraph has edges if edge between these node types is defined in schema. n To extract multiple graph data in a document Copyright© 2019 NTT Corp. All Rights Reserved. 12

How to Make STIX Data? 1. Extracting graphs from each paragraphs in a report

How to Make STIX Data? 1. Extracting graphs from each paragraphs in a report 2. Merging nodes by comparing node names/types 3. Rule-based conversion to STIX data Report Sub Graphs Graph STIX Copyright© 2019 NTT Corp. All Rights Reserved. 13

Example 2: A Sample Scenario Par. 1 Par. 2 Par. 3 Par. 4 Par.

Example 2: A Sample Scenario Par. 1 Par. 2 Par. 3 Par. 4 Par. 5 Copyright© 2019 NTT Corp. All Rights Reserved. 14

Schema & Rules for STIX 2 n Schema definition including 52 node types &

Schema & Rules for STIX 2 n Schema definition including 52 node types & 200 edge labels n SRO relationship_type between two SDO types n Additional node types for usability and a language's constraint n Examples of conversion rules ioc-ipv 4 file-sha 256 label=“indicator”, pattern="[ipv 4 -addr: vlaue=${NODE_NAME}]", labels=[“unknown”] label=“observed-data”, objects={ "0": {"type": "file", "hashes": { "SHA-256": "${NODE_NAME}"}} }, first_observed=${CURRENT_TIMESTAMP}, last_observed=${CURRENT_TIMESTAMP}, number_observed=1 Copyright© 2019 NTT Corp. All Rights Reserved. 15

Agenda n Background n Overview of Proposed Lightweight Markup Language n Demo n Capability

Agenda n Background n Overview of Proposed Lightweight Markup Language n Demo n Capability & Limitation n Current Implementation n Future Work Copyright© 2019 NTT Corp. All Rights Reserved. 16

Agenda n Background n Overview of Proposed Lightweight Markup Language n Demo n Capability

Agenda n Background n Overview of Proposed Lightweight Markup Language n Demo n Capability & Limitation n Current Implementation n Future Work Copyright© 2019 NTT Corp. All Rights Reserved. 17

Evaluation of Proposed Language 1. Comparing editing cost for evaluating usability n How easy

Evaluation of Proposed Language 1. Comparing editing cost for evaluating usability n How easy is creating structured data ? n Comparing raw STIX, RDF, DOT, and the proposed language 2. Measuring STIX data coverage for understanding tradeoff n How much is coverages ratio of STIX data on the proposed lang ? n Comparing between existing STIX data and the proposed lang Copyright© 2019 NTT Corp. All Rights Reserved. 18

Comparing Editing Cost: Method n Comparing editing costs on existing text-based threat reports n

Comparing Editing Cost: Method n Comparing editing costs on existing text-based threat reports n Levenshtein Distance*1 of markup-ed reports to original ones n Costs of STIX, RDF and DOT equal number of characters n expressions separated from text-based reports Proposed Lang Original Report Editing Cost: 16 [Evil. Rat]{malware} sends stolen Evil. Rat malware sends data to [192. 168. 0. 0]{ioc-ipv 4}. stolen data to 192. 168. 0. 0. It is the minimum number of single-character editing operations (insertion or deletion) required to convert one sequence into the other. *1 Copyright© 2019 NTT Corp. All Rights Reserved. 19

Comparing Editing Cost: Dataset n Collecting 15 threat reports on the WEB and extracting

Comparing Editing Cost: Dataset n Collecting 15 threat reports on the WEB and extracting these text n Creating structured data by using representations n STIX(JSON, YAML) n RDF(N-Triples, Turtle) n DOT Stats of Dataset Count Avg. text length (number of characters) 13, 548 Avg. number of SDOs (≒ number of nodes) 13. 9 Avg. number of SROs (≒ number of edges) 14. 5 Copyright© 2019 NTT Corp. All Rights Reserved. 20

Comparing Editing Cost: Result Avg. Cost Percentage of Avg. Cost to STIX(JSON) 17509 100

Comparing Editing Cost: Result Avg. Cost Percentage of Avg. Cost to STIX(JSON) 17509 100 STIX(YAML) 14001 80 RDF(N-Triples) 3239 19 RDF(Turtle) 2594 15 DOT 1793 10 Proposed Lang 327 2 Copyright© 2019 NTT Corp. All Rights Reserved. 21

STIX Data Coverage: Method & Dataset n Method: Comparing data coverage on existing STIX

STIX Data Coverage: Method & Dataset n Method: Comparing data coverage on existing STIX data n Coverage: Percentage of all STIX objects and object's attributions extracted from proposed lang's report to existing STIX data's ones n Dataset: 3 threat reports on the STIX official site*1 n APT 1, Poison Ivy, IMDDOS *1 https: //oasis-open. github. io/cti-documentation/stix/examples Copyright© 2019 NTT Corp. All Rights Reserved. 22

STIX Data Coverage: Result Report SDO Coverage SRO Coverage Attribution Coverage APT 1 46/48

STIX Data Coverage: Result Report SDO Coverage SRO Coverage Attribution Coverage APT 1 46/48 30/30 94/422 Poison Ivy 45/66 53/90 107/351 4/9 2/5 8/35 IMDDOS n Uncovered data n SDO: marking-definition, complex indicator containing AND/OR n SRO: associated with uncovered SDOs n Attribution: created, modified, description, objec_marking_refs, labels, and kill_chain_name Copyright© 2019 NTT Corp. All Rights Reserved. 23

Agenda n Background n Overview of Proposed Lightweight Markup Language n Demo n Capability

Agenda n Background n Overview of Proposed Lightweight Markup Language n Demo n Capability & Limitation n Current Implementation n Future Work Copyright© 2019 NTT Corp. All Rights Reserved. 24

Tools: Currently Internal Project : ( n Raph: A parser of lightweight markup language

Tools: Currently Internal Project : ( n Raph: A parser of lightweight markup language for graph description n Golang implementation and API n Domain independent graph description n r. CTI: Raph Language for Cyber Threat Intelligence n Golang's CLI tool, REST API server, and WEB editor app n CTI domain graph description n Exporting STIX 2 data Copyright© 2019 NTT Corp. All Rights Reserved. 25

r. CTI REST API sample Input Text Output STIX Copyright© 2019 NTT Corp. All

r. CTI REST API sample Input Text Output STIX Copyright© 2019 NTT Corp. All Rights Reserved. 26

Integration in Our Team New & Experimental Flow n Our Team's Workflow n Collecting

Integration in Our Team New & Experimental Flow n Our Team's Workflow n Collecting open source info Collab System for Security Reporting n Analyzing selected events App DBs n Writing reports with proposed lang Browser n Editing STIX data on WEB UI REST API n Publishing reports TIPs r. CTI Server Copyright© 2019 NTT Corp. All Rights Reserved. 27

Agenda n Background n Overview of Proposed Lightweight Markup Language n Demo n Capability

Agenda n Background n Overview of Proposed Lightweight Markup Language n Demo n Capability & Limitation n Current Implementation n Future Work Copyright© 2019 NTT Corp. All Rights Reserved. 28

Future Work n Language n Representing attributions n Schema n Fixing and Expanding schema

Future Work n Language n Representing attributions n Schema n Fixing and Expanding schema for CTI n Implementation n Resolving equivalence identification of nodes including TIP's ones n Next Step n Using markup-ed reports for ML-based knowledge extraction Copyright© 2019 NTT Corp. All Rights Reserved. 29

Summary Proposed a lightweight markup language for graph-structured CTI Copyright© 2019 NTT Corp. All

Summary Proposed a lightweight markup language for graph-structured CTI Copyright© 2019 NTT Corp. All Rights Reserved. 30