Question: Implement in JAVA. Imagine that you are building an online plagiarism checker, which allows teachers in the land of Edutopia to submit papers written by

Implement in JAVA.

Imagine that you are building an online plagiarism checker, which allows teachers in the land of Edutopia to submit papers written by their students and check if any of those students have copied whole sections from a set, D, of documents written in the Edutopian language that you have collected from the Internet. You have at your disposal a parser, P , that can take any document, d, and separate it into a sequence of its n words in their given order (with duplicates included) in O(n) time. You also have a perfect hash function, h, that maps any Edutopian word to an integer in the range from 1 to 1,000,000, with no collisions, in constant time. It is considered an act of plagiarism if any student uses a sequence of m words (in their given order) from a document in D, where m is a parameter set by parliament. Describe a system whereby you can read in an Edutopian document, d, of n words, and test if it contains an act of plagiarism. Your system should process the set of documents in D in expected time proportional to their total length, which is done just once. Then, your system should be able to pro- cess any given document, d, of n words, in expected O(n + m) time (not O(nm) time!) to detect a possible act of plagiarism.

Create a java implementation CheckPlagiarism.java that takes 3 commandline arguments - the corpus filename, the target filename and the length of the match sequence.

The corpus file has the format

:

For ex

134:"The quick brown fox jumped over the lazy dog"

145: "Lorem ipsum dolor sit amet, consectetur adipiscing elit. Nunc pellentesque turpis lorem, at convallis massa euismod quis. Cras blandit rutrum lacus tempor suscipit. Vestibulum in sagittis sem. Vestibulum id gravida felis. Morbi venenatis interdum purus a tincidunt. Aenean vel maximus magna."

The target file contains the text to be checked

For ex

The quick brown fox ate its breakfast slowly

The length is the minimum match required to prove plagiarism For ex 3

With the above example, your program should say "Plagiarized from 134". If the target were "The quick black fox ate its breakfast slowly" then it should print "Not Plagiarized". You may use standard implementations for string processing and hash tables.

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!