Software quality assurance a broader perspective Testing is

Software quality assurance: a broader perspective Testing is not the only aspect to software quality assurance Some terminology ¡ Validation -- assuring that the software corresponds to specifications Are we building the right product? ¡ Verification -- proving that the software behaves correctly Are we building the product right? Usually a form of validation Inspections and testing are examples of validation Program verification is an example of validation The general goal is not to let faults to become defects ¡ If a fault appears in released software, it is called defect

What is software quality? The answer depends on who you are ¡A user Functionality - functional requirements must be implemented correctly Portability - the system can be used with different hardware and OSes Performance - performance requirements must be satisfied Efficiency - the system uses the hardware and system resources efficiently Reliability - low defect-count Robustness - high resistance to user mistakes ¡ A developer Understandability - design and implementation are easy to understand well-documented Testability - easy to test Modifiability - easy to modify

Cost of software faults and defects Cost to the user ¡ Unavailable functionality ¡ Unsatisfactory performance ¡ Depending on the nature of the system, financial loss or lost earnings Cost to the developer/maintainer ¡ Processing bug reports ¡ Bug fixes or re-engineering ¡ Reinstall support ¡ Potential lost future business

How easy is it to remove a fault or defect? Depends on where the fault occurs ¡ The earlier in the lifecycle a fault is introduced, the more difficult it is to fix Depends on the stage on which the fault is noticed ¡ The later in the lifecycle the fault is noticed, the more difficult it is to find and fix Fixing a fault on the design stage is 3 -6 times more expensive than fixing a fault on the requirements stage Fixing a defect is 40 -1000 times more expensive than fixing a fault on the requirements stage ¡ Fixing defects costs more than fixing faults

What does software quality assurance involve Testing Software inspections and reviews ¡ Requirements specifications ¡ Design specifications ¡ Code ¡ Test suites ¡ Documentation Formal verification Light-weight automated validation and verification techniques ¡ Next lecture Process improvements based on the project experience

Correctness a product is functionally correct if it satisfies all the functional requirement specifications ¡ correctness is a mathematical property ¡ requires a specification of intent a product is behaviorly correct if it satisfies all the specified behavioral requirements ¡ difficult to prove poorly-quantified qualities such as user-friendly

Reliability measures the dependability of a product ¡ the probability that a product will perform as expected ¡ sometimes stated as a property of time e. g. , mean time to failure Reliability vs. Correctness ¡ reliability is relative, while functional correctness is absolute ¡ given a "correct" specification, a correct product is reliable, but not necessarily vice versa

Robustness behaves "reasonably" even in circumstances that were not expected ¡ making a system robust more then doubles development costs ¡ a system that is correct may not be robust, and vice versa

Why do software inspections? Code (analysis document, design document) reads --- they are so boring. Isn’t testing enough? ¡ No. Although testing plans have to be created throughout software development, testing cannot start until implementation A test that exhibits a failure has to be investigated labor-intensive debugging process Faults can be identified and eliminated early in development through informal code inspections ¡ Several studies reported that code reviews can be cheaper and more effective than testing

Reviews, Inspections, and Walkthroughs Formal reviews ¡ author or one reviewer leads a presentation of the product ¡ review is driven by presentation, issues raised Walkthroughs ¡ usually informal reviews of source code ¡ step-by-step, line-by-line review Inspections ¡ list of criteria drive review ¡ properties not limited to error correction

Review methods Fagan inspections ¡ formal, multi-stage process ¡ significant background & preparation ¡ led by moderator Active design reviews ¡ also called "phased inspections" ¡ several brief reviews rather than one large review Cleanroom ¡ formal review process ¡ Plus, statistical based testing

Fagan Inspections 3 -5 participants 5 stage process with significant preparation

Fagan Inspections participants (3 to 5 people) ¡ MODERATOR - responsible for organizing, scheduling, distributing materials, and leading the session ¡ AUTHOR - responsible for explaining the product ¡ SCRIBE - responsible for recording bugs found ¡ PLANNER or DESIGNER - author from a previous step in the software lifecycle ¡ USER REPRESENTATIVE - to relate the product to what the user wants ¡ PEERS OF THE AUTHOR - perhaps more experienced, perhaps less ¡ APPRENTICE - an observer who is there mostly to learn

Fagan Inspection Process Planning ¡ done by author(s) Prepare documents and an overview • explain content to the inspectors ¡ done by moderator Gather materials and insure that they meet entry criteria Arrange for participants • assign them roles • insure their training Arrange meeting

Fagan Inspection Process (cont. ) Preparation ¡ Participants study material Inspection ¡ Find/Report solutions) Rework ¡ Author faults (Do not discuss alternative fixes all faults Follow-Up ¡ Team certifies faults fixed and no new faults introduced

Fagan Inspection-general guidelines Distribute material ahead of time Use a written checklist of what should be considered ¡ e. g. , functional testing guidelines Criticize product, not the author

People Resource versus Schedule PEOPLE P L A N N I N G WITHOUT INSPECTIONS R E Q U I R E M E N T S WITH INSPECTIONS DESIGN CODING TESTING SHIP SCHEDULE

Some Experimental Results Using software inspections has repeatedly been shown to be cost effective Increases front-end costs ¡ ~15% increase to development cost Decreases overall cost Productivity numbers for the Fagan method ¡ Number of source code statements that can be covered per hour of overview: ~500 ¡ Number of source code statements participants can read through per hour of preparation: ~125 ¡ Number of source code statements that can be inspected per hour of meeting ~90 -125

An IBM study doubled number of lines of code produced person ¡ some of this due to inspection process reduced faults by 2/3 found 60 -90% of the faults found faults close to when they are introduced ¡ helps reduce cost

Why are inspections effective? knowing the product will be scrutinized causes developers to produce a better product having others scrutinize a product increases the probability that faults will be found walkthroughs and reviews are not as formal as inspections, but appear to also be effective ¡ hard to get empirical results

What are the deficiencies? focus on error detection ¡ what etc? about other "ilities" -- usability, portability, not applied consistently/rigorously ¡ inspection shows statistical improvement ¡ but cannot ensure quality human intensive and often makes ineffective use of human resources ¡ e. g. , skilled software engineer reviewing coding standards, spelling, etc. ¡ No automated support

Experimental Evaluation There have been many studies that have demonstrated the effectiveness of inspections Recent studies trying to determine what aspects of inspections are effective ¡ Provide insight into Ways to improve the process Ways to reduce the cost ¡ E. g. , Understanding the Sources of Variation in Software Inspections, Adam A. Porter, Harvey Siy, Audris Mockus, Lawrence G. Votta

The Lucent study Lucent compiler project for 5 ESS telephone switching system ¡ 55 K new lines; 10 K reused lines 6 developers; 5 other professionals ¡ At least 5 yrs. experience ¡ Inspection training Modify process ¡ Measure effect on number of defects ¡ Measure effect on inspection interval E. g. start of inspection to end ¡ People effort

Hypotheses Large teams ==> ¡ No increase in defects found ¡ Increase interval Multiple-session inspections ==> ¡ Increase in defects found ¡ Increase interval Correcting defects between sessions ==> ¡ Increase in defects found ¡ Increase interval Terminated this process early since it was too costly

Results from the experiment Team size did not impact effectiveness ¡ Can use a small team w/o jeopardizing the effectiveness Number of sessions did not impact effectiveness ¡ Can use one session Repairs between sessions did not improve defect detection but did increase interval Use single sessions inspections with small teams

Static Analysis Attempt to determine information about a system without executing it ¡ test data independent ¡ With testing, we only know how a program works on the executed test data ¡ With static analysis, we know certain facts about the system (for all test data!) Generally, refers to automated analysis methods

Major static analysis approaches Dependence analysis Data Flow Analysis Symbolic Execution Formal verification Static concurrency analysis ¡ Reachability analysis ¡ Flow equations ¡ Data flow analysis

Some types of information that can be computed automatically Unreachable code Unused variable definitions Uninitialized variables Constant variable values Pointer aliasing ¡ Side-effects!

Why do many software engineers claim that C sucks? Because it leaves the programmers with too much freedom to make mistakes ¡ No array bound checks ¡ Pointer arithmetic allowed ¡ No type checks ¡… Later-generation programming languages restrict many “dangerous” things on the level of compilation What do I do if I have to use C? ¡ Use LINT Static analyzer that does “sanity checks”

How can we extend quality assurance into maintenance? Capture and replay ¡ E. g. JRapture tool A. Podgurski and J. Steven, Case Western University In-field executions of Java programs are captured and saved, so that error traces are preserved ¡ Has to be very efficient not to incur significant overheads!

SQA group SQA stands for Software Quality Assurance May include developers or be independent from project development activities An SQA group is responsible for quality assurance activities: ¡ Planning - the target level of quality, evaluations, and reviews to be done, procedures for error reporting and tracking, documentation to be produced, and type of feedback ¡ Oversight - make sure that the process and the plan include adequate quality assurance activities and that these activities are followed during the project ¡ Analysis - review the ongoing quality assurance activities, possibly take measurements ¡ Reporting and record keeping - record quality concerns and deviations from the plan, report to upper management

Software Development Today Why do we have this structure? Decision Maker Programmer Tester

Typical Scenario (1) Decision Maker “OK, calm down. We’ll slip the schedule. Try again. ” “It doesn’t compile, @#$% it!” “I’m done. ” Programmer Tester

Typical Scenario (2) “Now remember, we’re all in this together. Try again. ” Decision Maker “It doesn’t install!” “I’m done. ” Programmer Tester

Typical Scenario (3) “Let’s have a meeting to straighten out the spec. ” Decision Maker “It does the wrong thing in half the tests. ” “I’m done. ” Programmer Tester “No, half of your tests are wrong!”

Typical Scenario (4) “Try again, but please hurry up!” Decision Maker “It still fails some tests we agreed on. ” “I’m done. ” Programmer Tester

Typical Scenario (5) “Oops, the world has changed. Here’s the new spec. ” Decision Maker “Yes, it’s done!” “I’m done. ” Programmer Tester

Cleanroom: S/W development process Mills, Harlan D. , Michael Dyer, and Richard C. Linger Originally proposed by H. Mills in the early 80’s H. Mills had previously proposed the chief programmer team concept

Major contributions Incremental development plan ¡ Instead of a pure waterfall model ¡ Incrementally develop subsystems Use formal models during specification and design ¡ Structured specifications ¡ State machine models Developers use informal verification instead of testing Independent, statistical based testing ¡ Based on usage scenarios derived from state machine models

Cleanroom

Black box stimulus history -> response The black box is a functional mapping of all possible stimulus histories to all possible responses. The black box mapping may be expressed in symbolic notation or in the natural language of the problem domain. S = the set of all possible system stimuli S* = the set of all possible stimulus histories R = the set of all possible system responses BB: S* -> R

State Box current stimulus, old state -> response, new state is the encapsulation of stimulus history one identifies the stimuli that need to be saved, and invents state variables to hold them, whereas the black box is represented as the transition ¡ s(current), S = T = R = SB: old state -> response, new state. the set of all possible system stimuli the set of all state data the set of all possible system responses S x T -> R x T

Clear Box The clear box is the full procedural design, specifying both data flow and control flow. New black boxes may be created to encapsulate lower-level functions. CB: S x T -> R x T, with implementation of state update and response production.

Verification ensure that a software design is a correct implementation of its specification team verification of correctness takes the place of individual unit testing benefits ¡ intellectual control of the process ¡ motivates developers to deliver error-free code ¡ verification is a form of peer review ¡ each person assumes responsibility for and derives a sense of ownership in the evolving product every person must agree that the work is correct before it is accepted -> successes are ultimately team successes, and failures are team failures

Verification team applies a set of correctness questions correctness is established by group consensus if it is obvious by formal proof techniques if it is not.

Cleanroom

Usage specification

Markov Analysis number of statistically typical (i. e. , likely) usage paths through the software long-run occupancy (i. e. , percentage of total usage time) in each state expected number of events in a test case expected number of test cases before a given usage state occurs expected number events between any two states expected minimum number of test cases required to cover all states in the model expected minimum number of test cases required to cover all transitions in the model

Markov Analysis prune the specification gauge complexity focus verification efforts identify the likelihood of given events project the test schedule ascertain the (affordable) upper bound on inferences about reliability

Statistical Testing Generation of Test Cases ¡ usage model->test cases may be automatically generated. ¡ each test case is a random walk through the usage model invocation->termination ¡ test cases constitute a "script" for use in testing. They may be applied by human testers, or used as input to an automated test tool. Stopping Criterion for Testing ¡ goals (e. g. , target level of estimated reliability) are achieved

Statistical Hypothesis Testing

Experimental evaluation of cleanroom Selby, R. W. , V. R. Basili, and F. T Baker, . "Cleanroom Software Development: An Empirical Evaluation, " IEEE Transactions on Software Engineering, September 1987, pp. 1027— 1037

Experimental design 15 three person teams, developed the same software system ¡ 88 -2300 LOCs ¡ 10 teams--cleanroom ¡ 5 teams used ad hoc techniques

Experimental results (in a nutshell) Cleanroom ¡ 6 of the 10 cleanroom teams completed ~90% of the project ¡ Met requirements better ¡ Had more operational test cases ¡ Met milestones (compared to only 2 of the traditional teams) ¡ 86% missed traditional testing and debugging ¡ 81% claimed they would use the technique again

Experience

Concluding remarks Some form of careful manual inspection seems to improve the quality of a s/w system and to improve productivity ¡ Not clear if the benefits of cleanroom are from the inspection aspects of the process or other aspects or some combination When deadlines are tight, it is very hard to commit the resources for such a labor intensive task Some automated support could help to reduce the manual effort involved ¡ Would this be effective or counter-productive?