Framework for plagiarism detection in Java code Anastas
- Slides: 18
Framework for plagiarism detection in Java code Anastas Misev Institute of Informatics Faculty of Natural Science and Mathematics University Ss Cyril and Methodius Skopje, Macedonia anastas@ii. edu. mk
Agenda l l l Introduction Basic idea Open framework Implementation Future work Questions and discussion
Introduction l l Increased number of assignments according to current trends (Bologna declaration, …) Increased number of students l l l 100% increase in our Institute in this academic year Accessibility of artifacts over the Internet Little or zero effort in plagiarism, especially in source code
A few words on plagiarism l Simple plagiarism l l Copy-paste (with some spacing and comments modification) Plagiarism with renaming l l l Methods, fields, classes Reordering of the code (that does not affect the final state) Addition of redundant lines of code
A few words on plagiarism (2) l Advanced plagiarism l l Changing of the control structures Mixing of several sources Mixing of own and others’ code Drawing the line !!!! l l It can be very hard Objective vs. subjective
Detection methods l Attribute counting l l l Used in the earliest tools Counting operators and operands Structure metrics l l Compare the structure Usage of tokens
Available tools l Sim l l Using dynamic programming compare tokens from the source Yap l l Using only specific tokens that reflect the structure Longest common subsequence
Available tools (2) l MOSS l l Available as service to the teachers over the Internet Important features include l l Unsceptible to spaces and tabs Noise suppression Location independency SID l Simple system
Open framework l l An implementation done as diploma thesis by D. Aleksovski Java based, open framework Initial purpose: analyze Java code Allows easy extension l l New analyzers New comparators
The architecture l Two basic elements l l l Analyzer – lexical and syntactical analysis of the code l l Analyzer Comparator Language specific Produce the syntax tree and stores it into the database Based on ANTRL Comparator – compare elements l Can be used to compare code, trees, fingerprints, …
The database
Operations Comparing sources System database Module 1. If the database contains Fingerprint for file 1, go to 4 2. Call compute. Fingerprint (file 1) 3. Store the fingerprint f 1 into the database 4. If the database contains Fingerprint for file 2, go to 7 5. Call compute. Fingerprint (file 2) 6. Store the fingerprint f 2 into the database 7. Forward the fingerprints to the comparator 8. Call compute. Similarity(f 1, f 2) 9. Store the values into the database
Extensions l Two different modules developed to test the framework l Simple module, basic features l l l Can only detect basic plagiarism Compares the structure of the syntax tree Advanced module l l Produces a fingerprint of the syntax tree Measures the longest common subsequence of the two fingerprints
Screen shots
Screen shots (2)
Initial results
Future work Support for additional languages l New and advanced comparators and analyzers l Web and web service interfaces l Integration into l l l moodle Eclipse
Questions and discussion
- Programming language b
- Code commit code build code deploy
- Common intrusion detection framework
- Java deadlock detection
- Java face detection
- Hamming code
- Parity bits
- Correction of error
- Code duplication detection
- Manipulation detection code
- Duplicate code detection
- 해밍코드 인코더
- Kontinuitetshantering i praktiken
- Typiska novell drag
- Nationell inriktning för artificiell intelligens
- Returpilarna
- Varför kallas perioden 1918-1939 för mellankrigstiden?
- En lathund för arbete med kontinuitetshantering
- Adressändring ideell förening