Towards Common Standards for Studies of Software Engineering

Premise: It is desirable to guide researchers studying SE tools n Proposal: Create an

Types of Evaluation Commonly Found in Tools Papers ¡ ¡ ¡ ¡ n Case

Inventory of Measures. n The following are purely examples that might be found in

Inventory of study types n ST 1. Usability evaluation of a specific feature or

Study types - continued n n ST 2. Comparison of a small number of

Study types - continued n n ST 3. Comparison of two alternative feature sets

Study types - continued n n ST 4. Comparison of presence and absence of

Study types - continued n n ST 5. Determination of which specific combinations of

Study types - continued n ST 6 Comparison of entire tools ¡ ¡ n

Slides: 10

Download presentation

Towards Common Standards for Studies of Software Engineering Tools and Tool Features Timothy C. Lethbridge University of Ottawa

Premise: It is desirable to guide researchers studying SE tools n Proposal: Create an inventory of practices to guide such studies n Researchers could then create papers that would be ¡ ¡ ¡ More comparable More easily reviewable More indexable

Types of Evaluation Commonly Found in Tools Papers ¡ ¡ ¡ ¡ n Case studies papers: ¡ n a) None - just a description b) Includes rationale c) Demonstration of adoption d) Anecdotes and lessons learned e) Informal studies - includes descriptive stats f) Formal experiments involving students g) Formal experiments involving practitioners Some combination of b-e Experimental papers: ¡ ¡ f and g but beware of overconfidence in results

Inventory of Measures. n The following are purely examples that might be found in such an inventory ¡ ¡ M 1. Time taken to perform a given task. M 2. Amount of a given task completed correctly in a fixed time. n ¡ ¡ The fixed time might depend on the task. M 3. Errors made in a given task M 4. Subjective answers on a scale to specific questions: n (Questions to be listed in the inventory)

Inventory of study types n ST 1. Usability evaluation of a specific feature or tool implementation. ¡ n Help ensure that results from other study types are not confounded purely by poor usability. Provides evidence for these research questions: ¡ Q 1 a To what extent is the feature or tool usable? n ¡ Measures: M 1, M 2 and M 3 (compared against a threshold). Q 1 b What usability defects are present and which ones should be repaired? (qualitative).

Study types - continued n n ST 2. Comparison of a small number of different feature implementations, each providing roughly the same functionality. Provides evidence for these research questions: ¡ Q 2 a What is the best user interface for a certain feature? n ¡ Measures: M 1, M 2, M 3, M 4 (measured separately for each implementation) Q 2 b What comments do users have about each implementation? (qualitative)

Study types - continued n n ST 3. Comparison of two alternative feature sets that achieve roughly the same goal, but in different ways. Provides evidence for these research questions: ¡ n Q 3 What is the 'best' functionality for a certain task? Measures: M 1, M 2, M 3, M 4 ¡ Measured separately for each feature set

Study types - continued n n ST 4. Comparison of presence and absence of a feature (or of a small feature set) in a tool Provides evidence for these research questions: ¡ Q 4 a Is the feature worth including in a final tool set? n ¡ Measures: M 1, M 2, M 3 (measured separately for a tool with presence or absence of the features) Q 4 b What benefits are provided by the feature? (qualitative)

Study types - continued n n ST 5. Determination of which specific combinations of features are most useful as the context varies Provides evidence for these research questions: ¡ n Q 5 Which features should be available in a given tool so the tool can be used in a variety of contexts? Measures: M 1, M 2, M 3, M 4 a, M 4 c ¡ Measured as the feature sets and contexts are varied in different combinations

Study types - continued n ST 6 Comparison of entire tools ¡ ¡ n Provides evidence for these research questions: ¡ n Incorporating sets of features Less abstract than ST 3 Q 6 Which of several tools is best used for a given task? Measures: M 1, M 2, M 3, M 4 ¡ Measured separately for each tool