Example application source code analysis 125 file types




















- Slides: 20

Example application: source code analysis 125 file types; 8029 files; 4689 non-Java; 1112 svn revisions 1

Querying Software Artefacts source code build scripts config files web pages developer dash board databases spreadsheets IDE plugin rs e rs bug reports pa version history query engine manager software repository excel add-in analyst 2

The problem design query language and engine for accessing vast repository of different types of source artefact libraries of queries: tailor framework to different types of artefact 3

Tough problem! Dozens of attempts, in industry and academia since 1984: databases, prolog, domain-specific query languages Difficulties: - does not scale - efficient queries extremely hard to write - specific to one kind of source artefact 18 man-years of research at University of Oxford 1996 -2005 to discover ingredients of solution 15 man-years to implement an industrial product 3 patents pending, several more in pipeline 4

Semmle. Code: the power of. QL 5

The query language. QL q Object-oriented, for creating libraries of queries q Recursive queries, as in logic programming q Familiar syntax to Java and SQL developers q On top of any traditional relational database q Syntax-highlighting, error-checking and auto-completion 6

How it works XML files RDBMS. QL library . QL query java / jar bytecode for search procedural SQL Semmle optimiser template for RDBMS 7

Demo q q The source we shall explore: Alfresco: Enterprise Content Management Spring: Java/JEE Application Framework Builds on Tomcat, JBoss, … Vital statistics: Demo parts: 50553 Java methods 6647 Java types 516 XML files • out-of-the-box • writing your own queries • querying XML config files 8

Using Semmle. Code out-of-the-box 115 pre-packaged queries Find common bug patterns: e. g. compare. To/equals, cloning, serialisation, internationalization Compute metrics: 42 different metrics, including Robert Martin’s package metrics Examine dependencies: e. g. cyclic package dependencies Visualization: pie charts, bar charts, tables, graphs, warnings/errors - easy navigation to source - exportable for generating reports 9

Writing queries of your own: select from Method m where m. from. Source() and m. has. Name("compare. To") and not m. get. Declaring. Type(). get. AMethod(). has. Name("equals") select m, "missing equals? " In general: from <variable-declarations> where <conditions> select <results> 10

Writing queries of your own: aggregates select sum (Compilation. Unit cu | cu. from. Source() | cu. get. Number. Of. Lines. Of. Code()) In general: agg( T 1 x 1, …, Tn xn | condition | expr ) 11

Writing queries of your own: recursion from Ref. Type s, Ref. Type t, Ref. Type it where it. has. Name("Password. Input. Tag") and it. has. Supertype*(s) and it. has. Supertype*(t) and t. has. Supertype(s) select t, s In general, can write recursive predicate definitions 12

Queries in. QL from-where-select autocompletion, typechecking, emptiness tests aggregates arbitrary nesting, no group-by needed recursion implicit with chaining; or explicit 13

Defining new classes in. QL class Class. Attribute extends XMLAttribute { Class. Attribute() { this. get. Name()="class" } string get. Class. Name() { this. get. Value() = result } Ref. Type get. Type() { result. get. Qualified. Name() = this. get. Class. Name() } predicate no. Type() { not exists(this. get. Type()) } } from Class. Attribute ca where ca. no. Type() and ca. get. Class. Name(). matches("org. alfresco%") select ca, ca. get. Class. Name() + " not found" 14

Classes in. QL classes are logical properties “constructor” specifies characteristic property methods body is relation between this, result and parameters more than one result allowed predicates methods without a result body is relation between this and parameters 15

The key points of. QL designed for creating libraries of queries classes are predicates inheritance is implication nondeterministic expressions recursion with super-simple semantics syntax familiar to SQL and Java programmers excellent error checking and IDE integration 16

Concluding remarks 19/07/2007 Semmle Ltd. © 2007

Couldn’t you use LINQ instead of. QL? q q q Different design goals: ORM versus libraries of queries LINQ does not provide recursion LINQ cannot do the optimisations across multiple queries that are key to efficiency in. QL “Fortunately, there is light in the darkness. Based on decades of programming language research, the brilliant team at Semmle has created an elegant, industrial strength object-oriented query language called. QL with full support for recursive queries and aggregation…. QL has all the requisites to become a runaway success. ” (Erik Meijer, Creator of LINQ, Microsoft) 18

Too good to be true? Jeff Ullman, 1991: It is not possible for a query language to be seriously logical and seriously object-oriented at the same time. key breakthroughs are Semmle’s proprietary technology: - design of. QL - optimisations on “bytecode for search” 19

Wrapping up Java is not enough source code analysis tools must process a multitude of artefacts libraries of queries a means to achieve such heterogeneous tools . QL object-oriented queries over trees and graphs made fast and easy 20