Automated KnowledgeBased Model Component Generation Ethan Trewhitt Stephen
Automated Knowledge-Based Model Component Generation Ethan Trewhitt Stephen Lee-Urban, Ph. D. David Huggins Elizabeth Whitaker, Ph. D.
Motivation: Test and Evaluation for Assessing Cyber Vulnerabilities • Test & Evaluation groups assessing the cyber vulnerabilities of their weapons systems have major issues – Large number of platforms that must be assessed – Compressed completion timelines and limited availability of subject matter experts (SMEs) qualified to conduct these required assessments – Skilled humans are needed provide deep insight and creative analysis to discover cyber vulnerabilities – Manual process is time consuming and expensive • Many research approaches have concentrated on full automation of specific analysis techniques, – Result in identification of only simple vulnerabilities or fail because of the target system’s complexity. 2
Knowledge-based Automatic Risk Assessment (KARA) • Automation of the analysis of system documentation • Generation of connectivity models from extracted text relationships • Will synthesize a representation of the vulnerabilities of the observed system using open vulnerability information and references (e. g. CVE– Common Vulnerabilities and Exposures) • Final KARA output: a vulnerability checklist to be provided in support of risk assessment for the system. 3
KARA Description • Human-in/on-the-loop system that uses natural language processing (NLP) techniques to examine technical documentation of cyber-physical systems as part of the initial activities of a vulnerability assessment • Using syntactical and grammatical analysis extracts the names of system components and determines how they are connected • Allows the user to modify and correct the model • Produces a vulnerability list using the KARA-built connectivity model to examine known vulnerability lists • Presents the operator with a list of potential cyber vulnerabilities for the target system 4
KARA Architecture Textual Automated Relationship Abstractor (TARA) Model Construction Reasoner (MCR) Graph and Network Construction Components GATE Pipeline Components Document Corpus Vulnerability Checklist Creator (VCC) Connectivity Relationship Phrases System Connectivity Model Graph and Network Analysis Components Vulnerability Checklist 5
KARA Modules • Textual Automated Relationship Abstractor (TARA) analyzes system documentation and extracts connectivity relationships which serve as input to the MCR • Model Construction Reasoner (MCR) creates a representative connectivity model(system Connectivity Model SCM) • Vulnerability Checklist Constructor (VCC) uses domain knowledge representing vulnerabilities of similar systems to analyze connected components in the connectivity model - creates a vulnerability checklist 6
System Flow • TARA to extract connectivity relationships used as input to the MCR – uses the open source NLP tool GATE (General Architecture for Text Engineering) – extended to use other open-source NLP components – GTRI developed some specialized components for KARA which are included in our GATE pipeline • MCR uses data from TARA and creates a graphical system connectivity model • VCC uses contents of the SCM to search the National Vulnerability Database (NVD) for applicable vulnerabilities relevant to the components and connections of the target system. 7
KARA Natural Language Processing (NLP) with GATE • General Architecture of Text Engineering (GATE): GATE is an open source natural language processing package. • Uses a natural language processing “Pipeline” which will apply each of the NLP functions to the documents in an order specialized by the developer • A named entity extractor to extract or highlight certain types of entities such as names, organizations, places and dates. • GATE is extended through the use of – Domain-specific gazetteers – custom language for developing rules (JAPE) for specialized patternmatching functions – Development of specialized modules using generalized software languages 8
TARA: Natural Language Processing Pipeline (GATE) Document Reset English Tokenizer Sentence Splitter Part-of-Speech Tagger Noun Phrase Chunker Stemmer Gazetteer Hyphen Finder (JAPE) Labeler (JAPE) Phraser (JAPE) Acronym Finder (JAPE) Hyphen Exporter Connection Phrase Exporter Found Gazetteer List Collector 9
Grammar Patterns: Detecting Triples • Patterns identified from human document analysis • Relationship patterns extracted by TARA: – “X connects to Y”: x connects y – “connect X to Y”: x connect y – “connect X using Y”: not a normal connection relationship, but includes *via* information – “X provides link to Y”: x link y (noun connection word) – “X senses connection with Y”: x connection y – “X will lose connection to Y”: x connection y (negative connection that implies a past connection; noun connector) – “X is positioned near Y”: x positioned y (geospatial relationship) – “X has something for use with Y”: x use with y – “X is configured to work with Y”: x work with y
TARA: From Natural Language to Graph Model TARA NLP Output NLP connectivity relationship extraction is performed by GATE. Triples: (component, connector, component) Grammatical phrases are recognized with tags for subject, predicate, connector words Graph Connectivity Model Connectivity relationship triples are created from NLP Output
Model Construction Reasoner (MCR) 12
MCR Output: System Connectivity Model (Graph) 13
Vulnerability Checklist Constructor (VCC) Purpose of the VCC: • Identify components with potential vulnerabilities • Enumerate those vulnerabilities, providing that information in the form of a final report System Connectivity Model (SCM) Vulnerability Seeker Component List Connection Triples Expert Input Objective: • provide user with concise picture of system being evaluated and its potential vulnerability profile • user may investigate each of the items on the list VCC Expert Checklist Refinement Checklist Builder Inferred Knowledge Additional Features General Knowledge Broad Component Classes Likely Vulnerabilities Vulnerability Checklist Drafts Vulnerability Checklist Vulnerability Database(s) 14
Example Vulnerability Checklist Mockup The final vulnerability checklist is produced as either an HTML document or as a PDF. Included Details: • Found component list • Component connectivity model view • Vulnerability matrix • Vulnerability list with assessed likelihoods and basic details
Summary • • Test & Evaluation groups assessing the cyber vulnerabilities of their systems have major issues with a manual process -- time consuming and expensive KARA research aims to mitigate this problem with: – Human In/On the Loop system that uses natural-language processing (NLP) techniques to examine technical documentation of cyber-physical systems as part of the initial activities of a vulnerability assessment. – Syntactical and grammatical analysis extracts the names of system components and determines how they are connected – Allows the user to modify and correct the model. – A vulnerability list is constructed using the KARA-built connectivity model to examine known vulnerability lists – Presents the operator with a list of potential cyber vulnerabilities for the target system. • KARA aims to minimize the need for human expertise during runtime – There are nevertheless pieces of information that are best provided by humans 16
- Slides: 16