Big Data Machine Learning With Applications to Aerospace

Big Data & Machine Learning With Applications to Aerospace NASA Machine Learning Workshop August, 2016 Jaime Carbonell and colleagues Language Technologies Institute School of Computer Science Carnegie Mellon University www. cs. cmu. edu/~jgc

Computer Science and Related Disciplines Lang Tech Systems + Theory Entertain Tech Comp Bio Math, Stats Sciences Machine Learning Human-Comp Interaction Humanities 3/2/2021 Fine Arts Robotics Computer Science Jaime G. Carbonell, Language Technolgies Institute Engineering 2

Computer Science and Related Disciplines Lang Tech Comp Bio Sciences Systems + Theory Scalable Machine Learning This presentation Machine Learning Human-Comp Interaction Humanities 3/2/2021 Entertain Tech Fine Arts Math, Stats Robotics Computer Science Jaime G. Carbonell, Language Technolgies Institute Engineering 3

AI is Becoming Central to the World Economy (Davos 2016) “The fourth Industrial Revolution” is characterized by: n “Ubiquitous and mobile internet”, n “Smaller and more powerful sensors”, n “Artificial Intelligence”, and n “Machine Learning” -- Prof. Klaus Schwab, founder of the Davos World Economic Forum, 2016 Need to focus on content, not hype 3/2/2021 Jaime G. Carbonell, Language Technolgies Institute 4

Key Components of AI o Perception n Vision, sonar, lidar, haptics, … o Action n Locomotion, manipulation, … o Reasoning n Planning, goal-oriented behavior, projection, … o Communication n Language, speech, dialog, social nets, … o Learning n Adaptation, reflection, knowledge acquisition, … 3/2/2021 Jaime G. Carbonell, Language Technolgies Institute 5

Key Components of AI o Perception n Vision, sonar, lidar, haptics, … o Action n Locomotion, manipulation, … n Planning, goal-oriented behavior, projection, … o Communication n Language, speech, dialog, social nets, … o Learning This presentation My research o Reasoning n Adaptation, reflection, knowledge acquisition, … 3/2/2021 Jaime G. Carbonell, Language Technolgies Institute 6

Big Data: How Big is Big? Dimensions of Big Data Analytics Billions++ of entries: Terabyes/Petabyes of data Trillions of potential relations among entries (graphs) Millions of attributes per entry (but typically sparse encoding) 3/2/2021 Jaime G. Carbonell, Language Technolgies Institute 7

AI and The Big-Data “Stack” Analytics/AI/Inference -- Machine Learning -- NLP. Data/text mining Insights, Alerts, Visualization Big-Data Architecture -- Spark/Graphlab -- No. SQL Big-Data “Plumbing” -- Cloud/Storage -- Resource Allocator Sensors Historical & Knowledge base Normative Data 3/2/2021 Jaime G. Carbonell, Language Technolgies Institute 8

Trends in Machine Learning o o “Deep” Learning (DNNs): vision, speech, NLP Reinforcement Learning: robotics Large-margin methods (SVM): classification Graphical models: strong priors, domain K. How to cope with label sparsity? o o (Pro)Active learning: optimizing external help Transfer/Multitask learning: related new domains Transductive/semi-supervised learning: (Pro)Active teaching: …coming next? 3/2/2021 Jaime G. Carbonell, Language Technolgies Institute 9

Machine Learning in a Nutshell o Training data: n Special case: o Functional space: o Fitness Criterion: n a. k. a. loss function o Sampling Strategy: 3/2/2021 Jaime G. Carbonell, Language Technolgies Institute 10

Why is Active Learning Important? o Labeled data volumes unlabeled data volumes n 1. 2% of all proteins have known structures n <. 01% of all galaxies in the Sloan Sky Survey have consensus type labels n <. 0001% of all web pages have topic labels n << E-10% of all internet sessions are labeled as to fraudulence (malware, etc. ) n <. 0001% of all financial transactions investigated w. r. t. fraudulence o If labeling is costly, or limited, select the instances with maximal impact for learning 3/2/2021 Jaime G. Carbonell, Language Technolgies Institute 11

Sampling Strategies o Random sampling (preserves distribution) o Uncertainty sampling (Lewis, 1996; Tong & Koller, 2000) n proximity to decision boundary n maximal distance to labeled x’s o Density sampling (k. NN-inspired Mc. Callum & Nigam, 2004) o Representative sampling (Xu et al, 2003) o Instability sampling: Max disagreement (probability-weighted) n x’s that maximally change decision boundary o Ensemble Strategies n Boosting-like ensemble (Baram, 2003) n DUAL (Donmez & Carbonell, 2007) o Dynamically switches strategies from Density-Based to Uncertainty. Based by estimating derivative of expected residual error reduction 3/2/2021 Jaime G. Carbonell, Language Technolgies Institute 12

Which point to sample? Grey = unlabeled Red = class A Brown = class B 3/2/2021 Jaime G. Carbonell, Language Technolgies Institute 13

Density-Based Sampling Centroid of largest unsampled cluster 3/2/2021 Jaime G. Carbonell, Language Technolgies Institute 14

Uncertainty Sampling Closest to decision boundary 3/2/2021 Jaime G. Carbonell, Language Technolgies Institute 15

Maximal Diversity Sampling Maximally distant from labeled x’s 3/2/2021 Jaime G. Carbonell, Language Technolgies Institute 16

Ensemble-Based Possibilities Uncertainty + Diversity criteria Density + uncertainty criteria 3/2/2021 Jaime G. Carbonell, Language Technolgies Institute 17

Maximum disagreement on Classifier Committee o How to measure the disagreement n Vote Entropy n KL Divergence # of votes that the label receives Classifier prob of label 3/2/2021 Consensus probability Jaime G. Carbonell, Language Technolgies Institute label is correct that 18 the

Strategy Selection: A Surprise There is No Universal Optimum • Optimal operating range for AL sampling strategies differs • How to get the best of both worlds? • (Hint: ensemble methods) 3/2/2021 Jaime G. Carbonell, Language Technolgies Institute 19

How does DUAL do better? o Runs DWUS until it estimates a cross-over o Monitor the change in expected error at each iteration to detect when it is stuck in local minima o DUAL uses a mixture model after the cross-over ( saturation ) point o Our goal should be to minimize the expected future error n If we knew the future error of Uncertainty Sampling (US) to be zero, then we’d force n But in practice, we do not know it 3/2/2021 Jaime G. Carbonell, Language Technolgies Institute 20

Results: DUAL vs DWUS 3/2/2021 Jaime G. Carbonell, Language Technolgies Institute 21

Active Learning is Awesome, but … is it Enough? Traditional Active Learning Single Perfect Source Fixed Labeling Cost CIKM ‘ 08 Multiple Sources Varying-Cost Model Going Beyond Differing Expertise Answer Reluctance Task Difficulty Labeling Noise Proactive Learning 3/2/2021 JMLR_’ 09 KDD ‘ 09 Fixed over time SDM_sub ‘ 10 Jaime G. Carbonell, Language Technolgies Institute Time-varying Expertise Level Ambiguity 22

Active vs Proactive Learning Active Learning Proactive Learning Number of Oracles Individual (only one) Multiple, with different capabilities, costs and areas of expertise Reliability Infallible (100% right) Variable across oracles and queries, depending on difficulty, expertise, … Reluctance Indefatigable (always answers) Variable across oracles and queries, depending on workload, certainty, … Cost per query Invariant (free or constant) Variable across oracles and queries, depending on workload, difficulty, … Note: “Oracle” {expert, experiment, computation, …} 3/2/2021 Jaime G. Carbonell, Language Technolgies Institute 23

Scenario: Reluctance o 2 oracles: n reliable oracle: expensive but always answers with a correct label n reluctant oracle: cheap but may not respond to some queries o Define a utility score as expected value of information at unit cost 3/2/2021 Jaime G. Carbonell, Language Technolgies Institute 24

How to estimate ? o Cluster unlabeled data using k-means o Ask the label of each cluster centroid to the reluctant oracle. If n label received: increase of nearby points n no label: decrease of nearby points equals 1 when label received, -1 otherwise o # clusters depend on the clustering budget and oracle fee 3/2/2021 Jaime G. Carbonell, Language Technolgies Institute 25

What If Oracles Change Over Time? o So far we learned oracle properties: Cost, Reliability, Reluctance n Assumed these properties remain constant n What if Oracle improves? Or gets tired? o Assume gradual change in Oracle properties n Bound temporal derivative n But, what if previously poor oracle is totally ignored by optimal strategy (never sampled)? n Exploration vs Exploitation tradeoff 3/2/2021 Jaime G. Carbonell, Language Technolgies Institute 26

SDM ‘ 10 Sequential Bayesian Filtering o Tracking the states of multiple systems as each evolves over time o Sequentially arriving observations (noisy labels) o Goal: Estimate posterior distribution Changing Accuracy with time t Noisy labels 3/2/2021 Jaime G. Carbonell, Language Technolgies Institute 27

SDM ‘ 10 Does Tracking Predictor Accuracy Actually Help in Proactive Learning? 3/2/2021 Jaime G. Carbonell, Language Technolgies Institute 28

Proactive Learning in Early Detection of Malware 3/2/2021 Jaime G. Carbonell, Language Technolgies Institute 29

Boeing-CMU Aerospace Data Analytics Lab Just after Takeoff Dreamliner Maiden Flight 15 -December-2009 3/2/2021 Jaime G. Carbonell, Language Technolgies Institute 30

F/A-18 Maintenance Decision Support Past: Reactive, Improve flight readiness § Computer-assisted diagnoses of F/A-18 troubles • § Computer-assisted expert finding • § Statistical recommendation (“collaborative filtering”) problem Computer-assisted resolution recommendation • § Statistical learning problem Information retrieval problem From research prototypes to operational systems • Software engineering problem (not done by CMU) Jaime G. Carbonell, Language Technolgies Institute 31

Information flow: Aircraft trouble reports Classifiers Recom. System Search Engine Jaime G. Carbonell, Language Technolgies Institute 32

The Task: Get the right plane to the right process, right facility, and right engineers o Each Operation requires selection of one from many n VSE: 18 categories, 83 sub-categories n “Application”: 28 categories n “Priority”: 5 categories n “Engineer”: 48 experts (“categories”) o We induce one classifier per category n Producing a probability estimates of correct assignments Jaime G. Carbonell, Language Technolgies Institute 33

Boeing Aerospace ML Challenges (Now: Proactive, Commercial Aviation) o Optimizing parts/assembly/systems preventive maintenance: Minimize airplane downtime o Multi-media time-series prediction n Text (maintenance reports) n Sensors (onboard, downloaded) n Parts networks (demand forecasting) n Fleet mix/ageing o Sensor data alone Terabytes in raw form o Very few serious malfunctions few labels 3/2/2021 Jaime G. Carbonell, Language Technolgies Institute 34

Active Learning for MT Parallel corpus Expert Translator S, T Trainer S Mode l Sampled corpus MT System Source Language Corpus 3/2/2021 Active Learner Jaime G. Carbonell, Language Technolgies Institute 35

Pro. Active Crowd Translation S, T 1 S, T Trainer 2 . . Translation Selection . S, T n Mode l S Sentenc e Selectio n Source Language Corpus 3/2/2021 MT System ACT Framework Jaime G. Carbonell, Language Technolgies Institute 36

Active Learning Strategy: Diminishing Density Weighted Diversity Sampling Experiments: Language Pair: Spanish-English Iterations: 20 Batch Size: 1000 sentences each Translation: Moses Phrase SMT Development Set: 343 sens Test Set: 506 sens Graph: X: Performance (BLEU ) Y: Data (Thousand words) 3/2/2021 Jaime G. Carbonell, Language Technolgies Institute 37

Translation Selection from AMT o Crowds beat experts • Translator Reliability • Translation Selection: 3/2/2021 Jaime G. Carbonell, Language Technolgies Institute 38

MT via LSTM (DNNs + Sequence) I'd → like I'd a like beer a stop STOP beer Attention history: 3/2/2021 Jaime G. Carbonell, Language Technolgies Institute 39

PROTEINS (Borrowed from: Judith Klein-Seetharaman) Sequence Structure Function Primary Sequence MNGTEGPNFY PLNYILLNLA KPMSNFRFGE HFIIPLIVIF SDFGPIFMTI VPFSNKTGVV VADLFMVFGG NHAIMGVAFT FCYGQLVFTV PAFFAKTSAV RSPFEAPQYY FTTTLYTSLH WVMALACAAP KEAAAQQQES YNPVIYIMMN LAEPWQFSML GYFVFGPTGC PLVGWSRYIP ATTQKAEKEV KQFRNCMVTT AAYMFLLIML NLEGFFATLG EGMQCSCGID TRMVIIMVIA LCCGKNPLGD GFPINFLTLY GEIALWSLVV YYTPHEETNN FLICWLPYAG DEASTTVSKT VTVQHKKLRT LAIERYVVVC ESFVIYMFVV VAFYIFTHQG ETSQVAPA Folding 3 D Structure Complex function within network of proteins 3/2/2021 Jaime G. Carbonell, Language Technolgies Institute Normal 40

PROTEINS Sequence Structure Function Primary Sequence MNGTEGPNFY PLNYILLNLA KPMSNFRFGE HFIIPLIVIF SDFGPIFMTI VPFSNKTGVV VADLFMVFGG NHAIMGVAFT FCYGQLVFTV PAFFAKTSAV RSPFEAPQYY FTTTLYTSLH WVMALACAAP KEAAAQQQES YNPVIYIMMN LAEPWQFSML GYFVFGPTGC PLVGWSRYIP ATTQKAEKEV KQFRNCMVTT AAYMFLLIML NLEGFFATLG EGMQCSCGID TRMVIIMVIA LCCGKNPLGD GFPINFLTLY GEIALWSLVV YYTPHEETNN FLICWLPYAG DEASTTVSKT VTVQHKKLRT LAIERYVVVC ESFVIYMFVV VAFYIFTHQG ETSQVAPA Folding 3 D Structure Complex function within network of proteins Cancer: 3/2/2021 unckecked cellular Jaime G. Carbonell, Language Technolgies Institute replicaiton 41

Computational Virology via PPI’s o o o 3/2/2021 Degree distribution / Hub analysis / Disease checking Graph modules analysis (from bi-clustering study) Protein-family based graph patterns: receptors / subclasses / ligands / etc Jaime G. Carbonell, Language Technolgies Institute 42 42

HIV-1 host protein interactions Fusion Reverse transcription HIV-1 depends on the cellular machinery in every aspect of its life cycle. Transcription Budding Maturation Peterlin and Torono, Nature Rev Immu 2003. 3/2/2021 Jaime G. Carbonell, Language Technolgies Institute 43

PPIs: Protein-Protein Interactions o The cell machinery is run by the proteins n Enzymatic activities, replication, translation, transport, signaling, structural o Proteins interact with each other to perform these functions Through physical contact Indirectly in a protein complex Indirectly in pathway Indirectly in a pathway 3/2/2021 http: //www. cellsignal. com/reference/pathway/Apoptosis_Overview. html Jaime G. Carbonell, Language Technolgies Institute

Estimating expert labeling accuracies Solve this through expectation maximization Assuming experts are conditionally independent given true label 3/2/2021 Jaime G. Carbonell, Language Technolgies Institute 45

Refined interactome Solid line: probability of being a direct interaction is ≥ 0. 5 Dashed line: probability of being a direct interaction is <0. 5 Edge thickness indicates confidence in the interaction 3/2/2021 Jaime G. Carbonell, Language Technolgies Institute 46

Transfer/Multi-Task Learning o Basic Idea: Map invariant properties from similar tasks previously learning tasks o Challenges: What to retain? How to modify? o History: n Transformation/Derivational Analogy (1980 s) n Case-Based Reasoning (1980 s-1990 s) n “Modern” Transfer Multi-Task (2000’s) o New focus: beyond transferring priors & features n Regularizers to maximize transfer n Structural biases 3/2/2021 Jaime G. Carbonell, Language Technolgies Institute 47

Host-pathogen interactions : The Multitask Landscape Homologous proteins due to common ancestors Firmicutes B. anthracis H. sapiens Bacteria Vertebrates Protists Y. pestis M. musculus Plants 3/2/2021 Enterobacter S. typhi A. Thaliana Jaime G. Carbonell, Language Technolgies Institute 48

Common Biological Pathways The “Glucose Transport Pathway” 3/2/2021 Jaime G. Carbonell, Language Technolgies Institute 49

Multi-task Objective For m tasks with parameters 1. Minimize empirical error 2. Enforce commonality hypothesis 3. Prevent overfitting Empirical loss 3/2/2021 Pathway regularizer Jaime G. Carbonell, Language Technolgies Institute L 2 regularizer 50

Kernel Mean Matching o KMM allows us to “soft select” examples o Reweighs source examples to make them look similar to target examples -- MMD 3/2/2021 Spectrum RBF Kernel - based on protein sequence n-grams Jaime G. Carbonell, Language 51 Technolgies Institute

AUC-PR on source 2 source tasks Train on 80%, test on 20% P, R measured on positives 3/2/2021 Jaime G. Carbonell, Language Technolgies Institute 52

Parting Thoughts o Multi-task/Transfer Learning n It really works! (referral nets, recommendation systems, proteomics, prerequisite graphs, …) n Fanciest method may not be best (e. g. KMM) o Proactive Learning n It too really works! (MT, malware detection, proteomics, recommendation systems) n Somehow the ML field has not yet embraced it o Both cope with label sparsity – necessary no matter what baseline ML method used 3/2/2021 Jaime G. Carbonell, Language Technolgies Institute 53

THANK YOU! 3/2/2021 Jaime G. Carbonell, Language Technolgies Institute 54

The Big-Data “Stack” Analytics Algorithms -- Machine Learning -- Pattern Detection Alerts, Visualization Big-Data Architecture -- Hadoop/H-Table -- Asynch/Pegasus Big-Data “Plumbing” -- Cloud/Storage -- Resource Allocator Sensors Historical & Knowledge base Normative Data 3/2/2021 Jaime G. Carbonell, Language Technolgies Institute 55

LTI COG (aka Agent) Architecture 3/2/2021 Jaime G. Carbonell, Language Technolgies Institute 56

Used Deep Learning (LDSTA model trained on Yahoo! answers data to match que with answer-bearing senten 3/2/2021 Jaime G. Carbonell, Language Technolgies Institute 57

3/2/2021 Jaime G. Carbonell, Language Technolgies Institute 58