Source Code Exploration with Google Denys Poshyvanyk Maksym

  • Slides: 33
Download presentation
Source Code Exploration with Google Denys Poshyvanyk, Maksym Petrenko, Andrian Marcus, Xinrong Xie, Dapeng

Source Code Exploration with Google Denys Poshyvanyk, Maksym Petrenko, Andrian Marcus, Xinrong Xie, Dapeng Liu Wayne State University Presented by: Roli Shrivastava

HISTORY Global Regular Expression Print (G/RE/P ) Existing Integrated Development Environments (IDE) File Searches

HISTORY Global Regular Expression Print (G/RE/P ) Existing Integrated Development Environments (IDE) File Searches Both are based on Regular Expression Matching Limitations of GREP and IDEs Supports only specific development or maintenance task Not in the mainstream of the software development practice. Case sensitive Limited interaction with potential users

MOTIVATION To understand large & new parts of the Software systems. People search codes

MOTIVATION To understand large & new parts of the Software systems. People search codes for: – – – Concept location in source code Impact analysis Change propagation Debugging Comprehension of software in general Hence to support them, we needed a fast and accurate tools and techniques.

PROPOSAL OF PAPER New approach to Source Code Exploration Integration of Google Desktop Search

PROPOSAL OF PAPER New approach to Source Code Exploration Integration of Google Desktop Search + IBM’s Eclipse Development Environment. Known as Google Eclipse Search (GES)

EXISTING APPROACH Searching based on Information Retrieval (IR) Indexing technique IR allows formulation of

EXISTING APPROACH Searching based on Information Retrieval (IR) Indexing technique IR allows formulation of queries with multiple words More popular than regular expression matching Problems: Computational Efficiency Online-Re-indexing of the software

GES Allows you to search software projects in a manner similar to searching the

GES Allows you to search software projects in a manner similar to searching the internet or their own desktops. Searching Within Projects / working set of files Uses Natural Language Queries GES has advantages of GDS + Eclipse’s Extensibility. GES based on IR indexing technique Idea is also integrated with MS Visual Studio Uses GDS to index and search source code files and project files Is efficient as GDS Re-indexing as the search space changes

Problems with GDS as a Standalone : ? ?

Problems with GDS as a Standalone : ? ?

LIMITATIONS OF GDS!! GDS is not project specific search – Searches files in the

LIMITATIONS OF GDS!! GDS is not project specific search – Searches files in the entire system Needs an internet browser Awkward !! – User has to switch between IDE and the browser Solution is definitely GES

GDS + ECLIPSE !!! On-the-Fly preprocessing and indexing of the context Continual indexing –

GDS + ECLIPSE !!! On-the-Fly preprocessing and indexing of the context Continual indexing – maintains and updates current location changes – Accurate results Immediate response for queries History of searches Advanced Search Options – Project specific search Sorting of the results – Relevance – Dates

ADVANTAGES features specific to IR-based searching – multiple term queries – natural language queries

ADVANTAGES features specific to IR-based searching – multiple term queries – natural language queries – Boolean operators – ranking of search results scalability & high reliability of the proven search engine (i. e. , GDS) – important for massive file – repositories, such as large scale software systems display of and access to the search results within Eclipse’s IDE – its native interfaces that provide direct links between the search results and the actual – source code in the editor.

SYSTEM REQUIREMENT To run GES, you will need: Eclipse SDK 3. 2 or higher;

SYSTEM REQUIREMENT To run GES, you will need: Eclipse SDK 3. 2 or higher; Google Desktop Search (GDK) 2. 0 or higher; Java Run-Time Environment (JRE) 1. 5 or higher.

GES DESIGN & IMPLEMENTATION GES similar to File Search in Eclipse. Type a Query

GES DESIGN & IMPLEMENTATION GES similar to File Search in Eclipse. Type a Query into the GES dialogue Box. Specify the Scope of the search – workspace – selected resources – enclosing projects – working sets After the query, the search is displayed in GES search Results Tab. Results can be explored by browsing in the editor.

GES SCREEN SHOT

GES SCREEN SHOT

SCREEN SHOT

SCREEN SHOT

PILOT CASE STUDY Performed on Violet (http: //www. horstmann. com/violet/) Violet is a Cross

PILOT CASE STUDY Performed on Violet (http: //www. horstmann. com/violet/) Violet is a Cross Platform UML Editor written in JAVA Has 65 classes + 448 methods + 9000 LOC Approach: To request for a new feature GOAL: “introduce a user-defined arrow type for the class diagram”.

QUERIES FOR PCS-I Q 2 : “arrow class diagram” OOPS… Did not return any

QUERIES FOR PCS-I Q 2 : “arrow class diagram” OOPS… Did not return any matches Q 3: “edge class diagrams” Worked

RESULTS 11 files as search results – – – Use. Case. Diagram. Graph State.

RESULTS 11 files as search results – – – Use. Case. Diagram. Graph State. Diagram. Graph Sequence. Diagram. Graph State. Transition. Edge Object. Diagram. Graph Note. Node Object. Node Field. Node Implicit. Parameter. Node Class. Diagram. Graph Call. Node.

ANALYSIS OF RESULTS Class. Diagram. Graph had the relevant result. To verify this finding:

ANALYSIS OF RESULTS Class. Diagram. Graph had the relevant result. To verify this finding: ‘draw’ and ‘get. Path’ methods in ‘Arrow. Head’ are modified. Related methods in Arrow. Head. Editor file are also modified successfully.

GES vs. FILE SEARCH Problem : “concept location task” in violet Goal : “to

GES vs. FILE SEARCH Problem : “concept location task” in violet Goal : “to locate the place in the source code which specifies the width of the class diagrams” File : “value saved in DEFAULT_WIDTH variable”

GES BEHAVIOR Q 1: “default width” “Bingo” in the first step itself…!!

GES BEHAVIOR Q 1: “default width” “Bingo” in the first step itself…!!

FILE SEARCH BEHAVIOR Q 1: “default width” “OOPSS !!! No results” Q 2: ”default”

FILE SEARCH BEHAVIOR Q 1: “default width” “OOPSS !!! No results” Q 2: ”default” “yes …. Hmmm closer” Q 3: “width” “yes… Much Closer”

FILE SEARCH can be made BETTER? ? In this particular case … “Default *Width”

FILE SEARCH can be made BETTER? ? In this particular case … “Default *Width” would have worked fine. Gave same result as GES in the 1 st attempt Drawback: To construct such expressions, – programmer should have additional information about identifiers – Unusable to construct such complex expressions all the time (this was a relatively simpler expression) – What will happen if the expression was more complex ? ? !!!

FILE SEARCH vs. GES RESULTS File Search had to be modified to get to

FILE SEARCH vs. GES RESULTS File Search had to be modified to get to the result – Narrow down the result by performing the search within the query GES gave results in the first query itself. GES is faster than File Search. GES investigates less LOCs. GES returns the ranked list of results. Developers learn relevant information faster than File Search.

STILL NOT SURE !! Authors say “This study has a proof-of-concept role, we do

STILL NOT SURE !! Authors say “This study has a proof-of-concept role, we do not generalize these conclusions”. Need more detailed case study to extend the results.

OTHER CASE STUDIES Needed a bigger project than “violet” Queries were run on –

OTHER CASE STUDIES Needed a bigger project than “violet” Queries were run on – P 4 2. 8 Ghz with 1 GB of RAM – GES plug-in – File Search in Eclipse 3 Art of illusion : 3 D modeling studio – Written in JAVA – Has 442 classes , 20 interfaces, 100838 LOC Eclipse Version 3. 1 + complete sources – 20000 files – 2 million LOC

METHODOLOGY 10 queries were run on each system Average response time needed for GES

METHODOLOGY 10 queries were run on each system Average response time needed for GES and File Search

COMPARING THE RESULTS

COMPARING THE RESULTS

DERIVED RESULTS !!! GES is more effective in terms of response time GES scales

DERIVED RESULTS !!! GES is more effective in terms of response time GES scales up very well with the size of the search space

LIMITATIONS GES uses GDS’s background indexing Only when user’s computer is idle User has

LIMITATIONS GES uses GDS’s background indexing Only when user’s computer is idle User has to wait for the (re)-indexing of the file. None of the GDS APIs handles this issue.

Q: Is this really an issue? ? A: As this is 1 -time step,

Q: Is this really an issue? ? A: As this is 1 -time step, it only affects the first search on a software system

CONCLUSION Integrating GDS into Eclipse – Improves source code searching – Produce easier to

CONCLUSION Integrating GDS into Eclipse – Improves source code searching – Produce easier to adopt approach GES allows to perform searches in – all the source code – Associated documentation Faster than the file search Queries do not take into account the format of the identifiers in the source code

RELATED WORKS JIRi. SS – an Eclipse plug-in for Source Code Exploration (Information Retrieval

RELATED WORKS JIRi. SS – an Eclipse plug-in for Source Code Exploration (Information Retrieval based Software Search for Java) http: //mercury. cs. wayne. edu/~vip/publications/Poshyvanyk. ICPC. 2006. JIR i. SS. pdf JIRi. SS includes other advanced features – automatically generated software vocabulary – advanced query formulation options – including spell-checking as well as fragment-based search. Information Retrieval – A book by C. J. van RIJSBERGEN http: //www. dcs. gla. ac. uk/Keith/Preface. html

DISCUSSIONS ‘n’ QUESTIONS? ?

DISCUSSIONS ‘n’ QUESTIONS? ?