Framework for plagiarism detection in Java code Anastas

  • Slides: 18
Download presentation
Framework for plagiarism detection in Java code Anastas Misev Institute of Informatics Faculty of

Framework for plagiarism detection in Java code Anastas Misev Institute of Informatics Faculty of Natural Science and Mathematics University Ss Cyril and Methodius Skopje, Macedonia anastas@ii. edu. mk

Agenda l l l Introduction Basic idea Open framework Implementation Future work Questions and

Agenda l l l Introduction Basic idea Open framework Implementation Future work Questions and discussion

Introduction l l Increased number of assignments according to current trends (Bologna declaration, …)

Introduction l l Increased number of assignments according to current trends (Bologna declaration, …) Increased number of students l l l 100% increase in our Institute in this academic year Accessibility of artifacts over the Internet Little or zero effort in plagiarism, especially in source code

A few words on plagiarism l Simple plagiarism l l Copy-paste (with some spacing

A few words on plagiarism l Simple plagiarism l l Copy-paste (with some spacing and comments modification) Plagiarism with renaming l l l Methods, fields, classes Reordering of the code (that does not affect the final state) Addition of redundant lines of code

A few words on plagiarism (2) l Advanced plagiarism l l Changing of the

A few words on plagiarism (2) l Advanced plagiarism l l Changing of the control structures Mixing of several sources Mixing of own and others’ code Drawing the line !!!! l l It can be very hard Objective vs. subjective

Detection methods l Attribute counting l l l Used in the earliest tools Counting

Detection methods l Attribute counting l l l Used in the earliest tools Counting operators and operands Structure metrics l l Compare the structure Usage of tokens

Available tools l Sim l l Using dynamic programming compare tokens from the source

Available tools l Sim l l Using dynamic programming compare tokens from the source Yap l l Using only specific tokens that reflect the structure Longest common subsequence

Available tools (2) l MOSS l l Available as service to the teachers over

Available tools (2) l MOSS l l Available as service to the teachers over the Internet Important features include l l Unsceptible to spaces and tabs Noise suppression Location independency SID l Simple system

Open framework l l An implementation done as diploma thesis by D. Aleksovski Java

Open framework l l An implementation done as diploma thesis by D. Aleksovski Java based, open framework Initial purpose: analyze Java code Allows easy extension l l New analyzers New comparators

The architecture l Two basic elements l l l Analyzer – lexical and syntactical

The architecture l Two basic elements l l l Analyzer – lexical and syntactical analysis of the code l l Analyzer Comparator Language specific Produce the syntax tree and stores it into the database Based on ANTRL Comparator – compare elements l Can be used to compare code, trees, fingerprints, …

The database

The database

Operations Comparing sources System database Module 1. If the database contains Fingerprint for file

Operations Comparing sources System database Module 1. If the database contains Fingerprint for file 1, go to 4 2. Call compute. Fingerprint (file 1) 3. Store the fingerprint f 1 into the database 4. If the database contains Fingerprint for file 2, go to 7 5. Call compute. Fingerprint (file 2) 6. Store the fingerprint f 2 into the database 7. Forward the fingerprints to the comparator 8. Call compute. Similarity(f 1, f 2) 9. Store the values into the database

Extensions l Two different modules developed to test the framework l Simple module, basic

Extensions l Two different modules developed to test the framework l Simple module, basic features l l l Can only detect basic plagiarism Compares the structure of the syntax tree Advanced module l l Produces a fingerprint of the syntax tree Measures the longest common subsequence of the two fingerprints

Screen shots

Screen shots

Screen shots (2)

Screen shots (2)

Initial results

Initial results

Future work Support for additional languages l New and advanced comparators and analyzers l

Future work Support for additional languages l New and advanced comparators and analyzers l Web and web service interfaces l Integration into l l l moodle Eclipse

Questions and discussion

Questions and discussion