Verification and Validation of Agentbased and Equationbased Simulations
Verification and Validation of Agentbased and Equation-based Simulations and Bioinformatics Computing: Identifying Transposable Elements in the Aedes aegypti Genome Ryan C. Kennedy Department of Computer Science and Engineering University of Notre Dame 1/21/2022
Verification and Validation of Agent-based and Equation-based Simulations 1/21/2022
Overview p Introduction n p Case Study I n p Motivation Concepts of Verification and Validation Research Objectives and Methods An Agent-based Scientific Model Case Study II n An Equation-based Economic Model Conclusion p Future Work p 1/21/2022
Motivation p NSF Blue Ribbon Panel (February 2006): “New theory and methods are needed for handling stochastic models and for developing meaningful and efficient approaches to the quantification of uncertainties. As they stand now, verification, validation, and uncertainty quantification are challenging and necessary research areas that must be actively pursued. ” *Oden: “Simulation-Based Engineering Science: Revolutionizing Engineering Science through Simulation” p Dr. Richard W. Amos n Deputy to the Commanding General, U. S. Army Aviation and Missile Command (AMCOM) Previously the Director of the System Simulation and Development Directorate in the Aviation and Missile Research, Development and Engineering Center (AMRDEC) n Verification and Validation n p 1/21/2022 10 -15% of total cost of model development, but often overlooked in overall lifecycle
Model Verification & Validation (V & V) p V&V n Verification: p n Validation: p p p solve model right solve right model The cost and value influence confidence of model Want optimal costeffectiveness of V & V *Adapted from Sargent: “Verification and Validation of Simulation Models” 1/21/2022
Verification and Validation Process *Adapted from Sargent: “Verification and Validation of Simulation Models” and Huang: “Agent-Based Scientific Simulation” 1/21/2022
Applicable Verification and Validation Methods *Balci: “Handbook of Simulation: Principles, Methodology, Advances, Applications, and Practice” lists more than 75 Methods 1/21/2022
V & V: Subjective Analysis p Examples n of V & V Techniques Face Validity p Animation p Graphical Representation Turing Test n Internal Validity n Tracing n Black-Box Testing n 1/21/2022
V & V: Quantitative Analysis p Examples of V & V Techniques Docking (Model-to-Model Comparison) n Historical Data Validation n Sensitivity Analysis/Parameter Variability n Prediction Validation n 1/21/2022
What and How p Research objective n p Perform V & V on distinct models and identify the more cost-effective techniques How n n 1/21/2022 Two very different projects as case studies Evaluate and adapt the formalized V & V techniques in industrial and system engineering
Case Study I: An Agent-based Scientific Model p NSF funded interdisciplinary project n n n Understanding the evolution and heterogeneous structure of Natural Organic Matter (NOM) E-science example Chemists, biologists, ecologists, and computer scientists Agent-based stochastic model p Web-based simulation model p 1/21/2022
Case Study I: NOM p What is NOM? n p Heterogeneous mixture of molecules in terrestrial and aquatic ecosystems Why study NOM? n n n 1/21/2022 Plays a crucial role in the evolution of soils, the transport of pollutants, and the global carbon cycle Understanding NOM helps us better understand natural ecosystems Hard to study in laboratory
Case Study I: The Conceptual Model I p Agents n A large number of molecules p Heterogeneous properties § Elemental composition § Molecular weight § Characteristic functional groups n Behaviors Transport through soil pores (spatial mobility) p Chemical reactions: first order and second order p Sorption p 1/21/2022
Case Study I: The Conceptual Model II p Stochastic Model n Individual behaviors and interactions are stochastically determined by: p Internal attributes § Molecular structure § State (adsorbed, desorbed, reacted, etc. ) p External conditions § Environment (p. H, light intensity, etc. ) § Proximity to other molecules p p Space n p Length of time step, Δt 2 D Grid Structure Emergent properties n 1/21/2022 Distribution of molecular properties over time
Case Study I: Implementations 1/21/2022
Case Study I: Face Validity 1/21/2022
Case Study I: Internal Validity I 1/21/2022
Case Study I: Internal Validity II 1/21/2022
Case Study I: Docking I p p p Compare the model with validated one Compare the model with non-validated one Different implementations n n p Different modeling approaches n p Different programming languages Different packages Agent-based approach vs. Equation-based approach Powerful method 1/21/2022
Case Study I: Docking II Features Alpha. Step No-Flow. Reaction Developing Group University of New Mexico, Department of Chemistry University of Notre Dame, Computer Science and Engineering Programming language Pascal Java (Sun JDK 1. 4. 2) Platforms Delphi 6, Windows Red hat Linux cluster Running mode Standalone Web based, standalone Simulation package None Repast toolkit Animation None Yes Spatial representation None 2 D grid Second order reaction Randomly pick one from list Choose the nearest neighbor First order with split Add to list Find empty cell nearby 1/21/2022
Case Study I: Docking III 1/21/2022
Case Study I: Docking IV 1/21/2022
Case Study I: Docking V 1/21/2022
Case Study II: An Economic Model p Interdisciplinary project n n 1/21/2022 Initially written in Matlab within Department of Finance Converted to C++ by Computer Scientists Equation-based system Concerned with identifying ideal economic variables, such as debt, money growth, and tax rate
Case Study II: The Conceptual Model p p Equation-based system Nonlinear projection methods used to solve Ramsey problems in a stochastic money economy Goal is to generate the best social welfare for a given economy Motivation 1/21/2022
Case Study II: Face Verification La. Gra nge Multipl ier Mone y Labor Growt h Tax Rate Cash Good Credit Good Matlab 0. 138 0. 309 -0. 009 0. 188 0. 486 0. 621 C++ 0. 138 0. 309 -0. 009 0. 188 0. 486 0. 621 Steady State 0. 138 0. 309 -0. 009 0. 188 0. 485 0. 620 1/21/2022
Case Study II: Tracing p Matlab: it 44, af 3. 7496 e-08, rc 0, timer 11. 1, l 0. 1382704496, m -0. 0092286139, t 0. 1881024991, h 0. 3093668925 cc 1 0. 4861695543, cc 2 0. 6212795130, rl 1. 0092221442 it 45, af 2. 64653 e-08, rc 0, timer 11. 0, l 0. 1382704643, m -0. 0092286175, t 0. 1881024947, h 0. 3093668931 cc 1 0. 4861695553, cc 2 0. 6212795120, rl 1. 0092221442 p C++: it: 44 af: 0. 00144839 rc: 0 l: 0. 138359 m: -0. 00936025 t: 0. 188252 h: 0. 309338 cc 1: 0. 486205 cc 2: 0. 621244 rl: -0. 65888 it: 45 af: 0. 00144784 rc: 0 l: 0. 138401 m: -0. 00937062 t: 0. 188239 h: 0. 30934 cc 1: 0. 486208 cc 2: 0. 621241 rl: -0. 665511 1/21/2022
Case Study II: Docking Features Matlab C++ Developing Group University of Notre Dame, Department of Finance University of Notre Dame, Computer Science and Engineering Language High-Level Lower-Level Compiler Interpreted GNU Compiler Good For Prototyping Speed Platforms Linux, Windows Linux Running mode Standalone Packages LAPACK, etc… STL, GSL Variables Implicit Declared 1/21/2022
Case Study II: Performance 5 Iterations 500 Iterations Matlab 58 s 568 s 8872 s C++ 2 s 176 s Speedup 29 x 33. 4 x 50. 4 x 1/21/2022
Summary & Conclusion p p Applied V & V techniques to distinct case studies to increase model confidence Some techniques are more cost-effective Agent-based (Stochastic) Cost Effectiveness Face Validation/Verification Low Very Good Turing Test Low Very Good Low Good Internal Validity Moderate Very Good n/a Tracing Moderate Fair Moderate Excellent Low Good Low Very Good Docking Moderate Very Good Moderate Good Historical Data Verification Moderate Very Good Sensitivity Analysis Moderate Good Prediction Validation Moderate Good Moderate Fair Black-Box Testing 1/21/2022 Equation-based (Deterministic)
Future Work More in-depth survey of V & V methods p More rigorous quantitative methods p Compare simulation results against empirical data p Invalidation Testing p More general and formalized V & V process model p 1/21/2022
Bioinformatics Computing: Identifying Transposable Elements in the Aedes aegypti Genome 1/21/2022
Overview p Introduction n Motivation Basic Biological Concepts Bioinformatics Aedes aegypti p Transposable Elements p Approaches to Identifying Transposable Elements p Conclusion p Future Work p 1/21/2022
Motivation p Bioinformatics field is rapidly growing n p Computer scientists can help advance its study A better understanding of the biology of organisms would be helpful to scientists n n 1/21/2022 Transposable elements can be useful tools to scientists Computer scientists can help biologists develop advanced techniques to find transposable elements
Biological Foundations p p All cells contain DNA, RNA, and protein molecules DNA n n p RNA n p Transfers DNA throughout a cell Protein n p Composed of four nucleotides Building block of life Laborer of the cell Central Dogma of Molecular Biology: 1/21/2022
Bioinformatics Collective study of numerous fields and techniques to solve biological problems p Focused on the study of DNA and its underlying characteristics p Computer science lends itself well to bioinformatics p 1/21/2022
Bioinformatics Research Topics p Genome Annotation n p Sequence Alignment n p Comparing two or more sequences Sequencing n p Assigning biological meaning to regions of a sequence Finding the structure of a given sequence Genome Assembly n 1/21/2022 Assembling many short sequences of DNA
Bioinformatics Tools p Perl n p Bio. Perl BLAST n Popular alignment tool Hidden Markov Model p Clustal X p Phylogenetic Tree p n p Relationships between sequences Bioinformatics Collaboratories n 1/21/2022 NCBI, Ensembl, Vector. Base
Aedes aegypti Tropical Mosquito p Vector for dengue and yellow fever viruses p Its unannotated genome recently released p Much larger genome than that of other mosquitoes p 1/21/2022
Transposable Elements Often referred to as “jumping genes” p Can make up large portions of a genome p Can transfer genetic material p Useful when performing evolutionary studies p Typically divided into Class I, Class II, and Class II elements p 1/21/2022
Transposons Class II transposable elements p Divided into many families p n p piggy. Bac, Tc 1, pogo, mariner, P element Typical structure of a transposon: 1/21/2022
Typical Approach BLAST known transposons against a new genome p Good for identifying known or similar transposons in new genomes p Does not account for sequence variations p 1/21/2022
Approach I p p p Focused on identifying P elements Utilized multiple tools and scripts Able to identify previously unknown transposons Clustal X and the HMMER suite allowed us to perform a more through search Cannot account for frame shifts 1/21/2022
1/21/2022
Approach II p p p Used for five families of transposons Utilized Gene. Wise Did not search for new transposons 1/21/2022
Hybrid Approach: A Transposable Element Discovery Methodology p p p Proposed approach Utilize better aspects of first two approaches Can be used for all families described in this study 1/21/2022
Phylogentic Tree mariner family p Clustered clades indicate close relationships p 1/21/2022
Summary & Conclusion p p Found a reasonable number of transposons Utilized novel approaches to finding transposons n p First such study using this type of approach on the Aedes aegypti genome Proposed a hybrid approach 1/21/2022 TE Nu m b er pigg y 12 B ac Tc 1 72 pogo 50 mari
Future Work Utilize hybrid approach p Automate process p Comparison of transposable elements found in Aedes aegypti and Anopheles gambiae p 1/21/2022
Questions or Comments? 1/21/2022
- Slides: 50