ModelDriven Test Design Jeff Offutt Professor Software Engineering

  • Slides: 53
Download presentation
Model-Driven Test Design Jeff Offutt Professor, Software Engineering George Mason University Fairfax, VA USA

Model-Driven Test Design Jeff Offutt Professor, Software Engineering George Mason University Fairfax, VA USA www. cs. gmu. edu/~offutt/ offutt@gmu. edu

OUTLINE 1. 2. 3. 4. 5. Why Do We Test? Poor Testing Has Real

OUTLINE 1. 2. 3. 4. 5. Why Do We Test? Poor Testing Has Real Costs Model-Driven Test Design Industrial Software Problems Testing Solutions that Work 1. Web Applications are Different 2. Input Validation Testing 3. Bypass Testing 6. Coming Changes in How We Test Software IDGA Military Test & Evaluation Summit © Jeff Offutt 2

Here! Test This! My first “professional” job Big software program Micro. Steff – big

Here! Test This! My first “professional” job Big software program Micro. Steff – big software system for the mac V. 1. 5. 1 Jan/2007 MF 2 -HD 1. 44 MB Data. Life Verdatim A stack of computer printouts—and no documentation IDGA Military Test & Evaluation Summit © Jeff Offutt 3

Cost of Testing You’re going to spend at least half of your development budget

Cost of Testing You’re going to spend at least half of your development budget on testing, whether you want to or not • In the real-world, testing is the principle postdesign activity • Restricting early testing usually increases cost • Extensive hardware-software integration requires more testing IDGA Military Test & Evaluation Summit © Jeff Offutt 4

Part 1 : Why Test? If you don’t know why you’re conducting a test,

Part 1 : Why Test? If you don’t know why you’re conducting a test, it won’t be very helpful • Written test objectives and requirements are rare • What are your planned coverage levels? • How much testing is enough? • Common objective – spend the budget … IDGA Military Test & Evaluation Summit © Jeff Offutt 5

Why Test? If you don’t start planning for each test when the functional requirements

Why Test? If you don’t start planning for each test when the functional requirements are formed, you’ll never know why you’re conducting the test • 1980: “The software shall be easily maintainable” • Threshold reliability requirements? • What fact is each test trying to verify? • Requirements definition teams should include testers! IDGA Military Test & Evaluation Summit © Jeff Offutt 6

Cost of Not Testing Program Managers often say: “Testing is too expensive. ” •

Cost of Not Testing Program Managers often say: “Testing is too expensive. ” • Not testing is even more expensive • Planning for testing after development is prohibitively expensive • A test station for circuit boards costs half a million dollars … • Software test tools cost less than $10, 000 !!! IDGA Military Test & Evaluation Summit © Jeff Offutt 7

Testing & Gov Procurement • A common government model is to develop systems –

Testing & Gov Procurement • A common government model is to develop systems – Usually by procuring components – Integrating them into a fully working system – Software is merely one component • This model has many advantages … • But some disadvantages … – Little control over the quality of components – Lots and lots of really bad software • 21 st century software is more than a “component” – Software is the brains that defines systems core behavior – Government must impose testing requirements IDGA Military Test & Evaluation Summit © Jeff Offutt 8

OUTLINE 1. 2. 3. 4. 5. Why Do We Test? Poor Testing Has Real

OUTLINE 1. 2. 3. 4. 5. Why Do We Test? Poor Testing Has Real Costs Model-Driven Test Design Industrial Software Problems Testing Solutions that Work 1. Web Applications are Different 2. Input Validation Testing 3. Bypass Testing 6. Coming Changes in How We Test Software IDGA Military Test & Evaluation Summit © Jeff Offutt 9

Software Testing—Academic View • 1970 s and 1980 s : Academics looked almost exclusively

Software Testing—Academic View • 1970 s and 1980 s : Academics looked almost exclusively at unit testing – Meanwhile industry & government focused almost exclusively on system testing • 1990 s : Some academics looked at system testing, some at integration testing – Growth of OO put complexity in the interconnections • 2000 s : Academics trying to move our rich collection of ideas into practice – Reliability requirements in industry & government are increasing exponentially IDGA Military Test & Evaluation Summit © Jeff Offutt 10

Academics and Practitioners • Academics focus on coverage criteria with strong bases in theory—quantitative

Academics and Practitioners • Academics focus on coverage criteria with strong bases in theory—quantitative techniques – Industry has focused on human-driven, domainknowledge based, qualitative techniques • Practitioners said “criteria-based coverage is too expensive” – Academics say “human-based testing is more expensive and ineffective” Practice is going through a revolution in what testing means to the success of software products IDGA Military Test & Evaluation Summit © Jeff Offutt 11

My Personal Evolution • In the ’ 80 s, my Ph. D work was

My Personal Evolution • In the ’ 80 s, my Ph. D work was on unit testing – That is all I saw in testing • In the ’ 90 s I recognized that OO put most of the complexity into software connections – Integration testing • I woke up in the ’ 00 s … – Web apps need to be very reliable and system testing is often appropriate – Agitar and Certess convinced me that we must relax theory to put our ideas into practice Testers ain’t mathematicians ! IDGA Military Test & Evaluation Summit © Jeff Offutt 12

Tech Transition in the 1990 s Naw - I already don’t plow as good

Tech Transition in the 1990 s Naw - I already don’t plow as good as I know how. . . They’re teaching a new way of plowing over at the Grange tonight - you going? “Knowing is not enough, we must apply. Willing is not enough, we must do. ” — Goethe IDGA Military Test & Evaluation Summit © Jeff Offutt 13

Failures in Production Software • NASA’s Mars lander, September 1999, crashed due to a

Failures in Production Software • NASA’s Mars lander, September 1999, crashed due to a units integration fault—over $50 million US ! • Huge losses due to web application failures – Financial services : $6. 5 million per hour – Credit card sales applications : $2. 4 million per hour • In Dec 2006, amazon. com’s BOGO offer turned into a double discount • 2007 : Symantec says that most security vulnerabilities are due to faulty software • Stronger testing could solve most of these problems World-wide monetary loss due to poor software is staggering IDGA Military Test & Evaluation Summit © Jeff Offutt 14

Testing in the 21 st Century • We are going through a dramatic time

Testing in the 21 st Century • We are going through a dramatic time of change • Software defines behavior – network routers, finance, switching networks, other infrastructure • Today’s software market : – is much bigger – is more competitive – has more users Industry & government orgs are going through a revolution in what testing means to the success of software products • The way we use software continues to expand • Software testing theory is very advanced • Yet practice continues to lag IDGA Military Test & Evaluation Summit © Jeff Offutt 15

Testing in the 21 st Century • The web offers a new deployment platform

Testing in the 21 st Century • The web offers a new deployment platform – Very competitive and very available to more users • • Enterprise applications means bigger programs, more users Embedded software is ubiquitous … check your pockets ! Paradoxically, free software increases our expectations ! Security is now all about software faults – Secure software is reliable software • Agile processes put enormous pressure on testers and programmers to test better No theory, research and little practical knowledge in how to test integrated software systems IDGA Military Test & Evaluation Summit © Jeff Offutt 16

How to Improve Testing ? • We need more and better software tools –

How to Improve Testing ? • We need more and better software tools – A stunning increase in available tools in the last 10 years! • We need to adopt practices and techniques that lead to more efficient and effective testing – More education – Different management organizational strategies • Testing / QA teams need to specialize more – This same trend happened for development in the 1990 s • Testing / QA teams need more technical expertise – Developer expertise has been increasing dramatically IDGA Military Test & Evaluation Summit © Jeff Offutt 17

OUTLINE 1. 2. 3. 4. 5. Why Do We Test? Poor Testing Has Real

OUTLINE 1. 2. 3. 4. 5. Why Do We Test? Poor Testing Has Real Costs Model-Driven Test Design Industrial Software Problems Testing Solutions that Work 1. Web Applications are Different 2. Input Validation Testing 3. Bypass Testing 6. Coming Changes in How We Test Software IDGA Military Test & Evaluation Summit © Jeff Offutt 18

Test Design in Context • Test Design is the process of designing input values

Test Design in Context • Test Design is the process of designing input values that will effectively test software • Test design is one of several activities for testing software – Most mathematical – Most technically challenging • This process is based on my text book with Ammann, Introduction to Software Testing • http: //www. cs. gmu. edu/~offutt/softwaretest/ IDGA Military Test & Evaluation Summit © Jeff Offutt 19

Types of Test Activities • Testing can be broken up into four general types

Types of Test Activities • Testing can be broken up into four general types of activities 1. 2. 3. 4. Test Design Automation Execution Evaluation 1. a) Criteria-based 1. b) Human-based • Each type of activity requires different skills, background knowledge, education and training • No reasonable software development organization uses the same people for requirements, design, implementation, integration and configuration control Why do test organizations still use the same people for all four test activities? ? This clearly wastes resources IDGA Military Test & Evaluation Summit © Jeff Offutt 20

Summary of Test Activities 1 a. Design Criteria 1 b. Design Human 2. Design

Summary of Test Activities 1 a. Design Criteria 1 b. Design Human 2. Design test values to satisfy engineering goals Requires knowledge of discrete math, programming and testing Design test values from domain knowledge and intuition Requires knowledge of domain, UI, testing Automation Embed test values into executable scripts Requires knowledge of scripting 3. Execution Run tests on the software and record the results Requires very little knowledge 4. Evaluation Evaluate results of testing, report to developers Requires domain knowledge • These four general test activities are quite different • It is a poor use of resources to use people inappropriately IDGA Military Test & Evaluation Summit © Jeff Offutt 21

Model-Driven Test Design – Steps mathematical analysis model / structure domain analysis software artifacts

Model-Driven Test Design – Steps mathematical analysis model / structure domain analysis software artifacts refined test requirements / requirements test specs generate criterion test requirements IMPLEMENTATION ABSTRACTION LEVEL DESIGN ABSTRACTION LEVEL k feedbac execute evaluate automate pass / test fail results scripts cases IDGA Military Test & Evaluation Summit © Jeff Offutt input values prefix postfix expected 22

MDTD – Activities model / structure Here be math refined requirements / test specs

MDTD – Activities model / structure Here be math refined requirements / test specs test requirements Test Design software artifact DESIGN ABSTRACTION LEVEL IMPLEMENTATION Raising our abstraction level makes ABSTRACTION test design MUCH easier LEVEL Test Automation pass / fail Test Evaluation IDGA Military Test & Evaluation Summit test results Test Execution test scripts © Jeff Offutt input values test cases 23

Using MDTD in Practice • This approach lets one test designer do the math

Using MDTD in Practice • This approach lets one test designer do the math • Then traditional testers and programmers can do their parts – Find values – Automate the tests – Run the tests – Evaluate the tests Testers ain’t mathematicians ! IDGA Military Test & Evaluation Summit © Jeff Offutt 24

OUTLINE 1. 2. 3. 4. 5. Why Do We Test? Poor Testing Has Real

OUTLINE 1. 2. 3. 4. 5. Why Do We Test? Poor Testing Has Real Costs Model-Driven Test Design Industrial Software Problems Testing Solutions that Work 1. Web Applications are Different 2. Input Validation Testing 3. Bypass Testing 6. Coming Changes in How We Test Software IDGA Military Test & Evaluation Summit © Jeff Offutt 25

GMU’s Software Engineering Lab • At GMU’s Software Engineering Lab, we are committed to

GMU’s Software Engineering Lab • At GMU’s Software Engineering Lab, we are committed to finding practical solutions to real problems Useful research in making better software • We are inventing new testing solutions • Web applications and web services • Critical software • Investigating current state of the practice and comparing with available techniques IDGA Military Test & Evaluation Summit © Jeff Offutt 26

Mismatch in Needs and Goals • Industry & contractors want simple and easy testing

Mismatch in Needs and Goals • Industry & contractors want simple and easy testing – Testers with no background in computing or math • Universities are graduating scientists – Industry needs engineers • Testing needs to be done more rigorously • Agile processes put lots of demands on testing – Programmers have to do unit testing – with no training, education or tools ! – Tests are key components of functional requirements – but who builds those tests ? Bottom line result—lots of crappy software IDGA Military Test & Evaluation Summit © Jeff Offutt 27

How to Improve Testing ? • Testers need more and better software tools •

How to Improve Testing ? • Testers need more and better software tools • Testers need to adopt practices and techniques that lead to more efficient and effective testing – More education – Different management organizational strategies • Testing / QA teams need more technical expertise – Developer expertise has been increasing dramatically • Testing / QA teams need to specialize more – This same trend happened for development in the 1990 s IDGA Military Test & Evaluation Summit © Jeff Offutt 28

Quality of Industry Tools • A recent evaluation of three industrial automatic unit test

Quality of Industry Tools • A recent evaluation of three industrial automatic unit test data generators : – Jcrasher, Test. Gen, JUB – Generate tests for Java classes – Evaluated on the basis of mutants killed • Compared with two test criteria – Random test generation (special-purpose tool) – Edge coverage criterion (by hand) • Eight Java classes – 61 methods, 534 LOC, 1070 faults (seeded by mutation) — Shuang Wang and Jeff Offutt, Comparison of Unit-Level Automated Test Generation Tools, Mutation 2009 IDGA Military Test & Evaluation Summit © Jeff Offutt 29

Unit Level ATDG Results 70% 68% 45% 50% 39% 40% 33% 30% 20% 10%

Unit Level ATDG Results 70% 68% 45% 50% 39% 40% 33% 30% 20% 10% 0% JCrasher Test. Gen JUB EC Random These tools essentially generate random values ! IDGA Military Test & Evaluation Summit © Jeff Offutt 30

Quality of Criteria-Based Tests • In another study, we compared four test criteria –

Quality of Criteria-Based Tests • In another study, we compared four test criteria – Edge-pair, All-uses, Prime path, Mutation – Generated tests for Java classes – Evaluated on the basis of finding hand-seeded faults • Twenty-nine Java packages – 51 classes, 174 methods, 2909 LOC • Eighty-eight faults — Nan Li, Upsorn Praphamontripong and Jeff Offutt, An Experimental Comparison of Four Unit Test Criteria: Mutation, Edge-Pair, All-uses and Prime Path Coverage, Mutation 2009 IDGA Military Test & Evaluation Summit © Jeff Offutt 31

Criteria-Based Test Results 75 80 70 54 60 53 56 Faults Found 50 40

Criteria-Based Test Results 75 80 70 54 60 53 56 Faults Found 50 40 35 Tests (normalized) 30 20 10 0 Edge-Pair All-Uses Prime Path Mutation Researchers have invented very powerful techniques IDGA Military Test & Evaluation Summit © Jeff Offutt 32

Industry and Research Tool Gap • We cannot compare these two studies directly •

Industry and Research Tool Gap • We cannot compare these two studies directly • However, we can compare the conclusions : – Industrial test data generators are ineffective – Edge coverage is much better than the tests the tools generated – Edge coverage is by far the weakest criterion • Biggest challenge was hand generation of tests • Software companies need to test better • And luckily, we have lots of room for improvement! IDGA Military Test & Evaluation Summit © Jeff Offutt 33

Four Roadblocks to Adoption 1. Lack of test education Bill Gates says half of

Four Roadblocks to Adoption 1. Lack of test education Bill Gates says half of MS engineers are testers, programmers spend half their time testing Number of UG CS programs in US that require testing ? Number of MS CS programs in US that require testing ? Number of UG testing classes in the US ? ~10 0 0 2. Necessity to change process Adoption of many test techniques and tools require changes in development process This is very expensive for most software companies 3. Usability of tools Many testing tools require the user to know the underlying theory to use them Do we need to understand an internal combustion engine to drive ? Do we need to understand parsing and code generation to use a compiler ? 4. Weak and ineffective tools Most test tools don’t do much – but most users do not realize they could be better Few tools solve the key technical problem – generating test values automatically IDGA Military Test & Evaluation Summit © Jeff Offutt 34

OUTLINE 1. 2. 3. 4. 5. Why Do We Test? Poor Testing Has Real

OUTLINE 1. 2. 3. 4. 5. Why Do We Test? Poor Testing Has Real Costs Model-Driven Test Design Industrial Software Problems Testing Solutions that Work 1. Web Applications are Different 2. Input Validation Testing 3. Bypass Testing 6. Coming Changes in How We Test Software IDGA Military Test & Evaluation Summit © Jeff Offutt 35

General Problems with Web Apps • The web offers a new deployment platform –

General Problems with Web Apps • The web offers a new deployment platform – Very competitive and very available to more users – Web apps are distributed – Web apps must be highly reliable • Web applications are heterogeneous, dynamic and must satisfy very high quality attributes • Use of the Web is hindered by low quality Web sites and applications • Web applications need to be built better and tested more • Most software faults are introduced during maintenance • Difficult because the web uses new and novel technologies IDGA Military Test & Evaluation Summit © Jeff Offutt 36

Technical Web App Issues 1. Software components are extremely loosely coupled – HTTP is

Technical Web App Issues 1. Software components are extremely loosely coupled – HTTP is stateless – Coupled through the Internet – separated by space – Coupled to diverse hardware and software applications 2. Potential control flows change dynamically – User control – back buttons, URL rewriting, refresh, caching – Server – redirect, forward, include, event listeners 3. State management is completely different – – – HTTP is stateless and software is distributed Traditional object oriented scopes are not available Page, request, session, application scope … IDGA Military Test & Evaluation Summit © Jeff Offutt 37

Input Space Grammars Input Space The set of allowable inputs to software • The

Input Space Grammars Input Space The set of allowable inputs to software • The input space can be described in many ways – – User manuals Unix man pages Method signature / Collection of method preconditions A language • Most input spaces can be described as grammars • Grammars are usually not provided, but creating them is a valuable service by the tester – Errors will often be found simply by creating the grammar IDGA Military Test & Evaluation Summit © Jeff Offutt 38

Using Input Grammars • • Software should reject or handle invalid data Programs often

Using Input Grammars • • Software should reject or handle invalid data Programs often do this incorrectly Some programs (rashly) assume all input data is correct Even if it works today … – What about after the program goes through some maintenance changes ? – What about if the component is reused in a new program ? • Consequences can be severe … – The database can be corrupted – Users are not satisfied – Most security vulnerabilities are due to unhandled exceptions … from invalid data IDGA Military Test & Evaluation Summit © Jeff Offutt 39

Validating Web App Inputs Input Validation Deciding if input values can be processed by

Validating Web App Inputs Input Validation Deciding if input values can be processed by the software • Before starting to process inputs, wisely written programs check that the inputs are valid • How should a program recognize invalid inputs ? • What should a program do with invalid inputs ? • If the input space is described as a grammar, a parser can check for validity automatically – This is very rare – It is easy to write input checkers – but also easy to make mistakes IDGA Military Test & Evaluation Summit © Jeff Offutt 40

Representing Input Domains Desired inputs (goal domain) Described inputs (specified domain) Accepted inputs (implemented

Representing Input Domains Desired inputs (goal domain) Described inputs (specified domain) Accepted inputs (implemented domain) IDGA Military Test & Evaluation Summit © Jeff Offutt 41

Representing Input Domains • Goal domains are often irregular • Goal domain for credit

Representing Input Domains • Goal domains are often irregular • Goal domain for credit cards† – – First digit is the Major Industry Identifier (bank, government, …) First 6 digits and length specify the issuer Final digit is a “check digit” Other digits identify a specific account • Common specified domain – First digit is in { 3, 4, 5, 6 } (travel and banking) – Length is between 13 and 16 • Common implemented domain arenumeric – All digits are † More details are on : http: //www. merriampark. com/anatomycc. htm IDGA Military Test & Evaluation Summit © Jeff Offutt 42

Representing Input Domains goal domain specified domain This region is a rich source of

Representing Input Domains goal domain specified domain This region is a rich source of software errors … implemented domain IDGA Military Test & Evaluation Summit © Jeff Offutt 43

Web Application Input Validation Check data Sensitive Data Bad Data • Corrupts data base

Web Application Input Validation Check data Sensitive Data Bad Data • Corrupts data base • Crashes server • Security violations Client Malicious Data Server Can “bypass” data checking IDGA Military Test & Evaluation Summit © Jeff Offutt 44

Bypass Testing Results v — Vasileios Papadimitriou. Masters thesis, Automating Bypass Testing for Web

Bypass Testing Results v — Vasileios Papadimitriou. Masters thesis, Automating Bypass Testing for Web Applications, GMU 2006 IDGA Military Test & Evaluation Summit © Jeff Offutt 45

Theory to Practice—Bypass Testing • Inventions from scientists are slow to move into practice

Theory to Practice—Bypass Testing • Inventions from scientists are slow to move into practice • Wanted to investigate whether the obstacles are : 1. Technical difficulties of applying to practical use 2. Social barriers 3. Business constraints • Tried to technology transition bypass testing to the research arm of Avaya Research Labs — Offutt, Wang and Ordille, An Industrial Case Study of Bypass Testing on Web Applications, ICST 2008 IDGA Military Test & Evaluation Summit © Jeff Offutt 46

Avaya Bypass Testing Results • Six screens were tested • Tests are invalid inputs

Avaya Bypass Testing Results • Six screens were tested • Tests are invalid inputs – exceptions are expected • Effects on back-end were not checked – Failure analysis based on response screens Web Screen Tests Failing Tests Unique Failures Points of Contact 42 23 12 Time Profile 53 23 Notification Profile 34 12 Notification Filter 26 16 5 1 1 24 17 14 184 92 63 Change PIN Create Account TOTAL IDGA Military Test & Evaluation Summit © Jeff Offutt 23 33% “efficiency” 6 rate is spectacular! 7 47

OUTLINE 1. 2. 3. 4. 5. Why Do We Test? Poor Testing Has Real

OUTLINE 1. 2. 3. 4. 5. Why Do We Test? Poor Testing Has Real Costs Model-Driven Test Design Industrial Software Problems Testing Solutions that Work 1. Web Applications are Different 2. Input Validation Testing 3. Bypass Testing 6. Coming Changes in How We Test Software IDGA Military Test & Evaluation Summit © Jeff Offutt 48

Needs From Researchers 1. Isolate : Invent processes and techniques that isolate theory from

Needs From Researchers 1. Isolate : Invent processes and techniques that isolate theory from most test practitioners 2. Disguise : Discover engineering techniques, standards and frameworks that disguise theory 3. Embed : theoretical ideas in tools 4. Experiment : Demonstrate economic value of criteria-based testing and ATDG – Which criteria should be used and when ? – When does the extra effort pay off ? 5. Integrate high-end testing with development IDGA Military Test & Evaluation Summit © Jeff Offutt 49

Needs From Educators 1. Disguise theory from engineers in classes 2. Omit theory when

Needs From Educators 1. Disguise theory from engineers in classes 2. Omit theory when it is not needed 3. Restructure curriculum to teach more than test design and theory – Test automation – Test evaluation – Human-based testing – Test-driven development IDGA Military Test & Evaluation Summit © Jeff Offutt 50

Changes in Practice 1. Reorganize test and QA teams to make effective use of

Changes in Practice 1. Reorganize test and QA teams to make effective use of individual abilities – One math-head can support many testers 2. Retrain test and QA teams – Use a process like MDTD – Learn more of the concepts in testing 3. Encourage researchers to embed and isolate – We are very responsive to research grants 4. Get involved in curricular design efforts through industrial advisory boards IDGA Military Test & Evaluation Summit © Jeff Offutt 51

Future of Software Testing 1. Increased specialization in testing teams will lead to more

Future of Software Testing 1. Increased specialization in testing teams will lead to more efficient and effective testing 2. Testing and QA teams will have more technical expertise 3. Developers will have more knowledge about testing and motivation to test better 4. Agile processes puts testing first—putting pressure on both testers and developers to test better 5. Testing and security are starting to merge 6. We will develop new ways to test connections within software-based systems IDGA Military Test & Evaluation Summit © Jeff Offutt 52

Contact Jeff Offutt offutt@gmu. edu http: //cs. gmu. edu/~offutt/ IDGA Military Test & Evaluation

Contact Jeff Offutt offutt@gmu. edu http: //cs. gmu. edu/~offutt/ IDGA Military Test & Evaluation Summit © Jeff Offutt 53