Beta
×

Welcome to the Slashdot Beta site -- learn more here. Use the link in the footer or click here to return to the Classic version of Slashdot.

Thank you!

Before you choose to head back to the Classic look of the site, we'd appreciate it if you share your thoughts on the Beta; your feedback is what drives our ongoing development.

Beta is different and we value you taking the time to try it out. Please take a look at the changes we've made in Beta and  learn more about it. Thanks for reading, and for making the site better!

Competition Seeks Best Approaches To Detecting Plagiarism

timothy posted more than 5 years ago | from the upsetting-the-market-in-online-term-papers dept.

Education 289

marpot writes "Does your school/university check your homeworks/theses for plagiarism? Nowadays, probably Yes, but are they doing it properly? Little is known about plagiarism detection accuracy, which is why we conduct a competition on plagiarism detection, sponsored by Yahoo! We have set up a corpus of artificial plagiarism which contains plagiarism with varying degrees of obfuscation, and translation plagiarism from Spanish or German source documents. A random plagiarist was employed who attempts to obfuscate his plagiarism with random sequences of text operations, e.g., shuffling, deleting, inserting, or replacing a word. Translated plagiarism is created using machine translation."

cancel ×

289 comments

Sorry! There are no comments related to the filter you selected.

Take a big fat dump on your teacher's desk (-1, Troll)

Anonymous Coward | more than 5 years ago | (#27746089)

Take a big fat dump on your teacher's desk and tell her that plagiarism smells better than shit.

Insightful fact... (3, Funny)

telchine (719345) | more than 5 years ago | (#27746101)

Here's an insightful fact related to this article:

Little is known about plagiarism detection accuracy

Re:Insightful fact... (3, Informative)

gnick (1211984) | more than 5 years ago | (#27746241)

But a lot of faith is put in it. I've got a friend that works at the University of Phoenix. We caught up not long ago and he was singing praises about how you just dump a paper into this tool he uses and it instantly tells you the exact percentage of plagiarism content in the student's paper. Too high == disciplinary action - Apparently without even bothering tracking sources or verifying specific plagiarized sections.

Of course, this all came to me second hand - I've not used the tools myself.

Re:Insightful fact... (4, Interesting)

Erwos (553607) | more than 5 years ago | (#27746365)

The tools are fairly good, but, in my experience, they'll always report 3-7% or so of your paper as plagiarized, just because it's pretty difficult to write about _anything_ without unknowingly using previously written words. I would _hope_ that anyone who would pursue disciplinary action from such a tool's results would at least take a look to see if the sections being flagged are consequential.

I have no idea how good they are with catching paraphrasing, though... it strikes me that the semi-intelligent plagiarizers would be doing that more than a straight copy and paste. There's also the "acceptable vs unacceptable" distinction to be made.

Re:Insightful fact... (2, Interesting)

mathx314 (1365325) | more than 5 years ago | (#27746587)

It can be substantially higher than that as well. In high school I wrote a five page paper about A Tale of Two Cities, with a few lengthy quotes, being a book by Dickens. Since it wasn't a terribly long paper and I had length quotes, I got somewhere around 20% plagiarized. Fortunately my teacher was smart enough to check before accusing me, but I remember hearing some talk from a later English teacher that the department was considering a 10% cutoff, above which you received disciplinary action regardless of the circumstances.

Re:Insightful fact... (3, Informative)

samcan (1349105) | more than 5 years ago | (#27746947)

Forget 20%, I had a rough draft with as high as 61%! The particular service we used in high school was Turnitin.com, and a research paper I wrote for high school had an appendix with a copy of the 1805 Treaty of Tripoli (as a help for the teacher)...the website flagged that as 18% plagiarized, from some random Bell Atlantic user's website.

Excluding that, the site would flag random sentences, and would flag part of a sentence as plagiarized, skip a word or two, and then say the rest of the sentence was plagiarized from the same source!

An example is shown below (words in bold are supposedly plagiarized from one source, words in italics from another):

Thus, the Founding Fathers wanted to create a government that was stable, and protected the rights of the people.

Another example from a paper on the Russo-German war of 1941:

They propose that German troops push all the way to the outskirts of Moscow, causing Joseph Stalin to abandon the city. While escaping, his train is destroyed by German planes, removing all signiïcant leadership to the Red Army.

In another paper, when I quoted an article, I listed the title of the article in-text. Turnitin reported that the title of the article was plagiarism...of the article I was citing!

Turnitin.com has "features" for excluding the quoted text, and excluding the bibliography, but as I use LaTeX, and like to use block quotes, the usefulness of these features are questionable.

In my opinion, Turnitin.com is a joke.

Re:Insightful fact... (4, Interesting)

BillCable (1464383) | more than 5 years ago | (#27746603)

My wife teaches for Phoenix. Probably 90% of the plagiarism she sees is from students copying and pasting whole papers word-for-word from random cheat sites. Occasionally she'll get someone who fails to properly quote sources, but that's very much the minority. For the most part, the cheaters aren't all that bright, nor do they try to hide their cheating. They're just hoping they get away with it.

Re:Insightful fact... (4, Insightful)

El_Muerte_TDS (592157) | more than 5 years ago | (#27746703)

For the most part, the cheaters aren't all that bright, nor do they try to hide their cheating.

How would you know? The best cheaters won't be caught, but that doesn't mean they're not cheaters.

Re:Insightful fact... (3, Insightful)

johnsonav (1098915) | more than 5 years ago | (#27746813)

The best cheaters won't be caught, but that doesn't mean they're not cheaters.

Sufficiently advanced cheating is indistinguishable from original work.

How can you know that everyone isn't cheating? Do you give up? Or, try and pick the low-hanging fruit?

Re:Insightful fact... (1)

BillCable (1464383) | more than 5 years ago | (#27746863)

I said "for the most part." It'd actually be a lot more effort to cheat and do enough to get away with it, than it would to just write the paper correctly. The people who are cheating seem to be doing it out of laziness or desperation. They run out of time to complete the assignment, so they Google something and use whatever pops up.

Re:Insightful fact... (2, Interesting)

SerpentMage (13390) | more than 5 years ago | (#27746917)

I think that this is very dangerous...

Let me tell you about a situation. I was a speaker until recently. And around 98 I was giving a talk on technology X. Another speaker who was from the company who created the technology also gave a talk on technology X. Me and this other speaker knew each other, but we did not converse.

Oddly our two talks were VERY VERY similar. He in a private manner accused me of copying his slide deck. Since he was a more well known speaker and I a newbie it seemed all logical.

It was only when a good friend of mine who also worked at the company jumped in and said, "Naa, he would not do that."

Then when my good friend came later to talk to me he asked, "you did not copy, right?"

Answer was a definite NO! I did not copy. We just happened to be thinking along the same lines and came up with a VERY VERY similar slide deck.

In other words a fluke! And this is why I hate statistics and numbers without a thought behind it.

Re:Insightful fact... (0, Redundant)

fuzzyfuzzyfungus (1223518) | more than 5 years ago | (#27746367)

It isn't faith! It's hard fact. How could something as precise as numbers possibly be misleading or fail to accurately represent the world? I bet that there is an appeals process, where you request three whole extra decimal places, as well. Luxury, I tell you.

Re:Insightful fact... (3, Insightful)

eln (21727) | more than 5 years ago | (#27746375)

That sort of thing is just unfair. In my opinion, plagiarism is indeed a heinous crime in an academic setting because it goes against everything the pursuit of academics is supposed to be about. Given that, the punishment should be severe.

However, since the punishment for plagiarism should be severe, there should be great care to investigate it properly. If you can show a preponderance of evidence that not only is a paper plagiarized, but you can accurately identify the source(s) from which each plagiarized section of it was copied, then the student should be expelled after the first offense. If you can't come up with that evidence, though, you should not be punishing the student.

I thought professors had legions of grad students to ferret this sort of thing out, why do they need these programs? Trusting a decision that could permanently impact a student's entire life to a computer program seems careless and dangerous.

Re:Insightful fact... (1)

Erwos (553607) | more than 5 years ago | (#27746521)

I don't really get what you're saying. If the program is showing 35%+ of the paper as plagiarized, that's pretty much a preponderance of evidence right there. The program will tell you were the plagiarism is from, too, if it's anything like what I used.

Re:Insightful fact... (2, Insightful)

Deagol (323173) | more than 5 years ago | (#27746669)

I think that the objection here comes from the lack of transparency of the product being used. You input a paper, and you get a percentage answer. You're not given a list of papers/sources that registered a match (it would seem, anyway -- I don't know), thus you cannot verify the claims of the machine. Of course, being proprietary systems, I highly doubt that the vendor will allow inspection of the methods of detection or the database.

The point is, that 35% means *nothing* useful without the exact context it was generated in.

As we've seen with black-box voting machines, block-box web filters, and black-box breathalyzers, I suspect we'll see many lawsuits about black-box plagiarism detectors. After all, such a program can adversely affect one's long-term future, so the system better damned well be transparent and close to infallible (at least as much as the human-based method of detection).

Re:Insightful fact... (1)

Erwos (553607) | more than 5 years ago | (#27746733)

The app I used not only told you what the plagiarized source was, but also gave you the passage that was plagiarized from. So your objection is irrelevant. In fact, I specifically addressed it in the post you're replying to.

These detectors are not black boxes at all.

Re:Insightful fact... (2, Informative)

bob.appleyard (1030756) | more than 5 years ago | (#27746799)

When I was at university, one of the lecturers showed us the plagiarism detection tool. Sure, it gave you a percentage, but it also gave you some output showing the passages in the text vs. what the program thought those passages had been taken from. He showed that most of the things that the tool had detected there were inconsequential, on the paper he was using for the demonstration.

Re:Insightful fact... (1)

Fulcrum of Evil (560260) | more than 5 years ago | (#27746679)

The program needs to justify its accusation - if 35% of the paper is plagiarized, it should be able to provide some lengthy passages from some other site that match the paper.

Re:Insightful fact... (1)

Lord Pillage (815466) | more than 5 years ago | (#27746775)

The problem is if it says that 35% was copied from another location, as long as the work is cited it technically isn't plagiarism since the student wouldn't be passing the work off as their own. They should probably receive a poor grade for lack of actual work, but as long as they don't claim it as their own, the text is not plagiarised.

Re:Insightful fact... (1)

Rob the Bold (788862) | more than 5 years ago | (#27746857)

I don't really get what you're saying. If the program is showing 35%+ of the paper as plagiarized, that's pretty much a preponderance of evidence right there. The program will tell you were the plagiarism is from, too, if it's anything like what I used.

You raise a good point. If the computer says it's plagiarism, then it is. Assuming that plagiarism is defined as, "what the program catches".

Re:Insightful fact... (4, Insightful)

bcrowell (177657) | more than 5 years ago | (#27746837)

In my opinion, plagiarism is indeed a heinous crime in an academic setting because it goes against everything the pursuit of academics is supposed to be about. Given that, the punishment should be severe. [...] the student should be expelled after the first offense

I teach physics at a community college, and although I don't assign the kind of term papers you'd see in an English course, I do grade homework, lab writeups, and exams, and plagiarism is an issue that comes up. My school's policy is that the only punishment the professor can give for cheating is to assign a zero on that particular assignment. This is, in my opinion, almost no punishment at all; typically the reason people cheat is because they know they're going to fail, so assigning an F isn't a punishment, it's more like assigning the grade that the student actually earned. The school's administration tells us that this policy is the way it is because of a recent legal decision in California. Before this rule was imposed on us, my policy had been to give the student an F in the course if it was a serious case of cheating. In any case, my school, like most community colleges, has an extremely late drop deadline (the 14th week of the semester), so, e.g., if I give a student an F on an exam for cheating on the exam, the student will typically just drop the course, resulting in no penalty on his transcript other than a W, which will not affect his GPA.

My school does provide a process where the professor can file a form to report academic misconduct. The form is then supposed to be followed up on by the dean, filed somewhere, and referred to later if the student shows a repeating pattern of cheating. Theoretically the student can be expelled, but never on the first offense. My experience is that this process doesn't actually seem to work, because the administrators involved aren't interested in spending the time and meeting with angry students. The threat hanging over the heads of the profs and deans is always that the parents will sue. Avoiding lawsuits is always the administration's top priority, far higher than education.

The long and the short of it is that when a student makes a calculated decision to risk cheating, he's usually doing it based on a realistic assessment that the consequences of getting caught are extremely mild.

However, since the punishment for plagiarism should be severe, there should be great care to investigate it properly. If you can show a preponderance of evidence that not only is a paper plagiarized, but you can accurately identify the source(s) from which each plagiarized section of it was copied, then the student should be expelled after the first offense. If you can't come up with that evidence, though, you should not be punishing the student.

There is absolutely no way, at least at my school, that a student would ever be expelled for plagiarism. To get expelled, you would have to physically attack someone. You seem to be imagining a situation in which the professor and/or the school punishes the student just because a particular piece of software flashes a message on the screen saying "plagiarized." I can't believe that anyone would ever do that. Of course you're going to look at the text that matched, and see whether you really believe that it looks like it was plagiarized.

I thought professors had legions of grad students to ferret this sort of thing out, why do they need these programs?

No, most professors do not have grad students to do this. I work at a community college. No grad students. My wife teaches at Cal State LA. They have grad students, but the grad students don't work as TAs or graders; the professors have to grade 100% of the written work.

Trusting a decision that could permanently impact a student's entire life to a computer program seems careless and dangerous.

I don't think anyone does trust such a decision to a program. They use the program as a first step.

Another Insightful fact... (1)

Serenissima (1210562) | more than 5 years ago | (#27746295)

Award

Yahoo! Research will award a cash prize of 500 Euros to the winner of the competition.

Wow, 500 Euros for solving a problem that every single college in the world would pay good money to have? Sounds like a gyp for the guy who wins.

"Yeah, thanks for spending time and effort to solve this complex problem. Here's your 500 Euros. Now we're going to go sell that pants off of this and make millions. Have a nice day!"

Re:Another Insightful fact... (0)

Anonymous Coward | more than 5 years ago | (#27746349)

But that would be plagiarism! D:

Just sell a copy to someone else (1)

wsanders (114993) | more than 5 years ago | (#27746443)

This is a contest to find and expert on plagiarism. If you're a so-called expert and win, sell your software to somebody else and make another 500 Euro.

Re:Just sell a copy to someone else (0)

Anonymous Coward | more than 5 years ago | (#27746591)

This is a contest to find and expert on plagiarism. If you're a so-called expert and win, sell your software to somebody else and make another 500 Euro.

I'm plagiarizing your business plan!

Re:Just sell a copy to someone else (0, Redundant)

Rob the Bold (788862) | more than 5 years ago | (#27746963)

This is a contest to find and expert on plagiarism. If you're a so-called expert and win, sell your software to somebody else and make another 500 Euro.

Euro 500 Euro 0 (0)

Anonymous Coward | more than 5 years ago | (#27746765)

If this was a contest for open source software, the winner would get zero Euros.

Here is my perosnal take on the article... (4, Funny)

svendsen (1029716) | more than 5 years ago | (#27746131)

Does your school/university check your homeworks/theses for plagiarism? Nowadays, probably Yes, but are they doing it properly? Little is known about plagiarism detection accuracy, which is why we conduct a competition on plagiarism detection, sponsored by Yahoo! We have set up a corpus of artificial plagiarism which contains plagiarism with varying degrees of obfuscation, and translation plagiarism from Spanish or German source documents. A random plagiarist was employed who attempts to obfuscate his plagiarism with random sequences of text operations, e.g., shuffling, deleting, inserting, or replacing a word. Translated plagiarism is created using machine translation

Re:Here is my perosnal take on the article... (-1, Redundant)

svendsen (1029716) | more than 5 years ago | (#27746263)

Dear mods get a sense of humor. The article is about plagiarism and here I am copying the summary of the article from someone else. See? Funny!

Re:Here is my perosnal take on the article... (1)

rainmayun (842754) | more than 5 years ago | (#27746305)

You forgot to translate it into Swedish.... or ROT-13.

Re:Here is my perosnal take on the article... (2, Funny)

MadKeithV (102058) | more than 5 years ago | (#27746465)

It's encrypted with double ROT-13.

Re:Here is my perosnal take on the article... (0)

Anonymous Coward | more than 5 years ago | (#27746569)

The original mods were correct.

Your comment was the same joke as the one made above you, therefore it is a redundant joke.

Re:Here is my perosnal take on the article... (1)

snarfies (115214) | more than 5 years ago | (#27746689)

Does your college check your theses/homeworks for plagiarism? Nowadays, probably so, but are they doing it correctly? Not much is known about the accuracy of plagiarism detection, which is why we conduct a competition on plagiarism detection, which was sponsored by Yahoo! We have set up a body of fake plagiarism which consists of plagiarism with varying degrees of translation plagiarism from Spanish or German source documents and obfuscation. A randomly selected plagiarist was used who tries to cover her plagiarism with random sequences of text changes, e.g., deleting, inserting, replacing, or shuffling some words. Translated plagiarism is made using computerized translation.

Re:Here is my perosnal take on the article... (1)

El_Muerte_TDS (592157) | more than 5 years ago | (#27746745)

Does your school verify his thesis / homeworks of plagiarism? Today, probably, but they are done properly? Not much is known about the accuracy of detection of plagiarism, which is why we carry out a competition in the detection of plagiarism, which was sponsored by Yahoo! We created a body that is false plagiarism plagiarism plagiarism with varying degrees of translation from Spanish or German source documents and obfuscation. A randomly selected plagiarist who was trying to cover his theft with random sequences of changes of text, for example, delete, insert, replace, or shuffling a few words. Plagiarism is translated by computer translation.

Re:Here is my perosnal take on the article... (0)

Anonymous Coward | more than 5 years ago | (#27746751)

What is your school / university to verify your homework / essay plagiarism? Now, perhaps, but their rights? Little is known about the accuracy of detection of plagiarism, which is why we are testing a plagiarism contest, sponsored by Yahoo! We implemented a series of plagiarism plagiarism containing more or less artificial wind, plagiarism and translations from the Spanish or German origin. Arab employees who are trying to cover plagiarism plagiarized his business with a random sequence of text, such as bats, delete, insert or change a word. Creating pagsasalin plagiarism using automatic translation.

My solution is the best. (0, Redundant)

Roskolnikov (68772) | more than 5 years ago | (#27746135)

Not only will my solution find those rascally cheaters in record time, it will also determine that all others in the competition have copied my work.

Mod erudition test (1)

pjt33 (739471) | more than 5 years ago | (#27746239)

Accuse me of plagiarism and I'll publish a book about that little matter involving you and a pawnbroker.

My solution is better (0)

Anonymous Coward | more than 5 years ago | (#27746341)

Just go to the University of Delaware. The penalty for plagiarism is the vice-presidency.

Defeat Plagerism (1)

zoomshorts (137587) | more than 5 years ago | (#27746153)

Simply disallow the use of words.

Vice President Biden, are you listening???

Re:Defeat Plagerism (5, Funny)

gnick (1211984) | more than 5 years ago | (#27746283)

Simply using words would not constitute plagiarism. You just can't allow students to use words that somebody else has used before.

For more information of this technique, please read my recent paper, Clickous Verandim Redundo Berata Quizzomandus.

Plagarism.. (1, Funny)

Anonymous Coward | more than 5 years ago | (#27746159)

I think the hardest plagiarism to spot is one where you copy the main idea but you put everything into your own sentence. The main reason is that semantics is still an open problem in AI.

Re:Plagarism.. (1)

wstrucke (876891) | more than 5 years ago | (#27746309)

There are very few, if any, original ideas in the scheme of things. AFAIK the point of writing most "papers" in high school/college is to describe an idea or support a thesis -- either way, you are probably not the first person (or the only person) who thinks X.

If you are not plagiarizing then you should be the only one who supports or describes the idea the way you do.

Re:Plagarism.. (1)

Brewmeister_Z (1246424) | more than 5 years ago | (#27746851)

I have the same thought about writing papers... sorry, I must be plagiarizing you.

Many of the papers I had to write required citing someone credible to back up your position. The school I went to used APA format so the plagiarism check basically looked at that referenced source and showed where you cited it so it was more of a APA format check. I hate the idea of paraphrasing and then still giving credit to someone else so I would quote the source most of the time.

So what do you do when you actually have an original idea but yet need include cited material? Most likely you end up using sources out of context to support you idea and then give someone else credit.

Then there is the whole issue of whether something is common knowledge...

Re:Plagarism.. (0)

Anonymous Coward | more than 5 years ago | (#27746879)

citation needed^^

Stating made up facts, supplemented with weak anecdotes, makes for a mildly amusing read at best.

Re:Plagarism.. (1)

furby076 (1461805) | more than 5 years ago | (#27746547)

Taking someones own work and then rephrasing it into your own work is plagiarism? I remember teachers telling me that ingesting someone's data and spitting out the data in your own words is not plagiarism. That's how research is done - you read about stuff and you spit it out. Sometimes you cite specific passages, other times you site sources you referenced.

Re:Plagarism.. (1)

Erwos (553607) | more than 5 years ago | (#27746665)

Depends on the degree it's being done. Search the Internets for "plagiarism paraphrasing" - it should be enlightening.

Re:Plagarism.. (1)

samcan (1349105) | more than 5 years ago | (#27746981)

Technically, yes.

Whenever you use an author's thought on the idea, you need to cite.

There are exceptions for common knowledge...for example, you don't need to cite that the sky is blue.

Easy solution (1)

gnick (1211984) | more than 5 years ago | (#27746161)

Oesday ouryay oolschay/universitysay eckchay ouryay omeworkshay/esesthay orfay agiarismplay?

As long as your prof accepts foreign language papers, you're golden. Or, find a paper that you want to rip off written in German/French/Spanish/whatever and dump it through babelfish:

Your school/university controls your homeworks/teses plagiat?

Hmmm.... (1, Insightful)

Anonymous Coward | more than 5 years ago | (#27746177)

Given that many, many teachers give out broadly similar assignments all over the country, how many years it will be until most possible ways of talking, say, of what Dante meant in a certain canto in the Inferno, will be in the database and will make it impossible to write a paper without being suspected of plagiarizing? Especially if the system runs with a very low threshold (say, 3-4 words in a row that are the same = plagiarizing)

It would really be interesting if all the published books on one particular subject (again, say, the Divine Comedy) were submitted to this service and a check was run about just how much 'plagiarizing' and 'original thinking' there is going around...

Re:Hmmm.... (1)

wstrucke (876891) | more than 5 years ago | (#27746501)

True -- I had the same thought myself in fact. Eventually the database will reach a threshold where the majority of ways of describing something will be "used" already.

I do not agree, however, that teachers will blindly accept the results of the computer. Some may, and I hope that parents and administrators raise hell when they do. Most teachers and users of the system should be intelligent enough to know that student A in their class did not copy three lines verbose from a paper written ten years before on the other side of the country.

I do not necessarily agree with the use of this system and the potential for abuse, but I do believe the idea is sound if applied appropriately.

Re:Hmmm.... (1)

leonardluen (211265) | more than 5 years ago | (#27746613)

the distinction of being on "the other side of the country" doesn't mean a whole lot with the internet.

Re:Hmmm.... (1)

wstrucke (876891) | more than 5 years ago | (#27746919)

the distinction of being on "the other side of the country" doesn't mean a whole lot with the internet.

My point is that the results of any computer evaluation must be looked at critically and intelligently. It would be an abuse of the system to blindly accept results and punish students.

It's a natural part of the system that some material or ideas will appear to be plagiarized -- it's up to the instructor based on the content, structure, and syntax of the paper to determine if it was indeed copied or just an unfortunate choice of words by the student.

Re:Hmmm.... (1)

shawn(at)fsu (447153) | more than 5 years ago | (#27746645)

So basically everything that can be written has been written?
It depends like you said on the threshold but their are other forces at work. Esp when you are catching the most stupid of plagiarism. For example in my courts class in college we had to do case briefs of famous supreme court rulings. The kid who sat next to me asked for an extension for one and was give a few more days to work on it. For him working on it meant copying whole sections verbatim from one of our textbooks. Yeah he got caught, and no it wasn't hard to prove. Some students might be more crafty than others but for the most part I think you could catch the large majority of them. If their is doubt the professor could always ask them to explain something, for example: "What did you mean by this part?" or by comparing results to essays on tests where you don't have an oppertunity to get a paper of the internet. /yes I know He didn't really say everything that can be invented has been invented.

Re:Hmmm.... (1)

adonoman (624929) | more than 5 years ago | (#27746875)

Just wait until I publish my paper: An Enumeration of All Possible English Phrase Permutations of Length 5-10 Words. It's quite an epic read.

a aa aaa aal aaas
a aa aaa aal aachen
a aa aaa aal aafp
a aa aaa aal aah
a aa aaa aal aahed
a aa aaa aal aahing
a aa aaa aal aahs ...

zym zymase zymases zymo zymogen zymogens zymogram zymograms zymosan zymosans

* Note that the above sample is copyrighted and any attempts at plagiarism will be dealt with.

Plausible test? (4, Insightful)

fuzzyfuzzyfungus (1223518) | more than 5 years ago | (#27746193)

Now, I understand that plagiarism is common among the weakest of undergrad writers; but "machine translation from Spanish or German source documents" and "random text operations" seem like unrealistic experimental stimuli.

In order to be a success, a plagiarized paper has to survive scrutiny by automated systems, if any are deployed, and human graders, if any are paying attention. Machine translation and text mangling should trivially defeat automated systems, at least any that aren't cranked well into World o' false positives territory; but would they pass human scrutiny? Even if they did, handing in something produced by machine translation and text mangling would probably earn you a referral to "Remedial English 101 For Life".

monkeys on a typewriter (0)

Anonymous Coward | more than 5 years ago | (#27746233)

if you submit enough essays to a plagerism database wouldn't you eventually run into every paper submited turning up as plagerised? that's what i never understood about english departments buying into this let it go for a few decades and if they decide to copyright there databases and make them publicly available that could be a funny buisness model though

It's not that hard... (1)

XPeter (1429763) | more than 5 years ago | (#27746251)

Copy and paste some of the text of the suspected document into Google. If something with the same or similar wording comes up, it's plagiarized. Simple.

Re:It's not that hard... (0, Redundant)

Anonymous Coward | more than 5 years ago | (#27746323)

Copy and paste some of the text of the suspected document into Google. If something with the same or similar wording comes up, it's plagiarized. Simple!

Re:It's not that hard... (0, Redundant)

maxwell demon (590494) | more than 5 years ago | (#27746401)

Copy and paste some of the document's text into a search machine. If something with about the same wording comes up, it's plagiarized. Simple!

Re:It's not that hard... (1)

wstrucke (876891) | more than 5 years ago | (#27746527)

...assuming that every document ever written was online and indexed by Google. Kinda seems like shooting in the dark. maybe you hit something, maybe you don't -- missing doesn't mean there's not something there and only provides a false sense of security.

Irony (4, Funny)

Shadow Wrought (586631) | more than 5 years ago | (#27746303)

Just imagine everyone's surprise when all the entrants turn in the exact same process.

Detecting Design (1)

geoffrobinson (109879) | more than 5 years ago | (#27746317)

I thought detecting design wasn't science. I guess that only applies if we don't like the implications of a possible "yes." Otherwise, it can be science.

Re:Detecting Design (1)

DragonWriter (970822) | more than 5 years ago | (#27746567)

I thought detecting design wasn't science. I guess that only applies if we don't like the implications of a possible "yes." Otherwise, it can be science.

Like answering any other question of fact, answering the question "is this outcome the result of design?" can be science, if it is done using the scientific method.

Deliberately ignoring empirical evidence and making "ooh, that seems hard, it must be design" arguments, as is done in the most popular "detecting design" effort that is dismissed as not being science is, indeed, not science.

Make plagarism harder than writing original work. (1)

gyroidben (1223170) | more than 5 years ago | (#27746321)

It's always going to be possible to plagarise but as long as it's more difficult that actually writing original work it's not so much of a problem. Translating from a foreign language (even with the help of an automatic translator) is probably more work than just writing the work yourself. Swapping a whole bunch of words probably also requires comparable effort if you don't want it too sound too silly.

Avoiding plagiarism? (1)

tepples (727027) | more than 5 years ago | (#27746343)

When George Harrison wrote the song "My Sweet Lord" for his solo debut album, he accidentally plagiarized a Ronald Mack song. He ended up losing a million dollar lawsuit over it [wikipedia.org] . What should he have done to avoid plagiarizing any of the millions of songs that had been written before then?

Re:Avoiding plagiarism? (1)

Stratocastr (1234756) | more than 5 years ago | (#27746373)

avoid the G-C-D chord progression.

Re:Avoiding plagiarism? (1)

gnick (1211984) | more than 5 years ago | (#27746583)

Just a side note - George Harrison/Ronald Mack is a much better example of musical plagiarism than what sprang to my mind.

Damn you xkcd [xkcd.com] . You've ruined me.

Duplicated code (1)

tcopeland (32225) | more than 5 years ago | (#27746345)

A while back I worked on a program to find duplicated code - CPD (copy/paste detector) [sourceforge.net] . It discards comments and whitespace and (optionally) normalizes variable names... but probably wouldn't deal well with tokens being moved around. There's a chapter on it in my PMD book [pmdapplied.com] , too.

What was interesting were some of the performance optimizations that folks came up with. My first version used JavaSpaces to distribute the computation - but subsequent versions (thanks to Brian Ewins and Steve Hawkins) were fast enough to run on one just machine. Good times.

Plagiarism detection is easy (4, Insightful)

DingerX (847589) | more than 5 years ago | (#27746363)

A plagiarised paper just smells bad, and is characterized by shifts in voices and writing styles, sudden ignorance of the the critical points raised earlier. The same author who can't write a grammatically correct sentence one moment is throwing down complex constructions the next The harder part is identifying the source of the plagiarism. For undergraduate papers, even the harder part is trivial. After all, the point of plagiarism is that the author is too lazy to write anything original.

For academics (professors), the situation isn't all that different. Plagiarism is usually a mix of stupidity, laziness and pressure to get stuff done. It usually happens where big, popularizing authors try to rip off the obscure ones (go back twenty years a la Mr. Ambrose, or pick something in a different language, preferably Italian), or when someone needs a book in an obscure field, and tries to pirate something really obscure.

Even so, if a plagiarist has enemies who give a damn, they can find the source fairly fast. So why construct a test for the most obfuscated cases, when a plagiarist clever enough to obfuscate could simply write something original and sufficiently clever?

Re:Plagiarism detection is easy (1)

Erwos (553607) | more than 5 years ago | (#27746417)

Very true. My wife reviews proposals at her work from time to time, and she has gotten surprisingly good at detecting which ones are doing wholesale plagiarizing. I suspect she'd probably miss it if it was a sentence or two, but some of these idiots are doing whole pages of it.

Re:Plagiarism detection is easy (1)

fuzzyfuzzyfungus (1223518) | more than 5 years ago | (#27746455)

I get the distinct impression that most of the interest in automated plagiarism detection has little to do with circumstances where writers are writing to actually be read, and more to do with ensuring Compliance among high school students and undergrads in big lecture courses.

As you say, if somebody actually reads it, it won't be too hard to detect. If the somebody reading has an ongoing familiarity with the writer's style, it'll be even easier. What they want, though, is something that can skip that step and just make sure that a whole bunch of students obeyed instructions at the lowest possible cost.

Re:Plagiarism detection is easy (1)

zolltron (863074) | more than 5 years ago | (#27746491)

It's not so easy as you think. People can often modify the apparent plagiarism my changing words around, substituting synonyms, adding in extra words. Then you have to search for different parts of the sentence, considering different wordings. Sometimes people purchase papers from their friends or a service who never posted their paper online.

When you have a class of 100 students with 2-3 potential plagarism cases, it can take significant time to track down. All this is taking away from time the instructor could be giving helpful feedback to students who are interested in learning.

The services that exist now are already very good at saving time by focusing one's attention on particular cases that can be proven. If those tools get better it can reduce that time even more. Overall this will significantly improve the quality of education by both freeing up time and also preventing the incentive to cheat yourself out of an education by plagarizing.

Re:Plagiarism detection is easy (1)

Rageon (522706) | more than 5 years ago | (#27746655)

The harder part is identifying the source of the plagiarism. For undergraduate papers, even the harder part is trivial.

Wouldn't simply requiring authors to cite their sources solve this problem? Yes, it's a pain to cite -- but any form of serious writing should, and usually does, require it. I'm not talking about strictly following BlueBook legal citation rules, but something more than a "list of authorities used".

Re:Plagiarism detection is easy (1)

Thaelon (250687) | more than 5 years ago | (#27746921)

Don't blame laziness!

Progress is made by lazy men looking for easier ways to do things.

patent office (0)

Anonymous Coward | more than 5 years ago | (#27746371)

The patent office should use something like this! Even a simple algorithm should be able to weed out many invalid applications.

What about coincidence and quotation? (1)

Andy_R (114137) | more than 5 years ago | (#27746385)

This isn't a particularly good test of plagiarism detection at all, since the data corpus is computer generated. Real-world plagiarism detection needs to take account of subject matter (correct answers to a physics paper will be less diverse than ones on wide ranging literary topics) and allowable duplication, such as quotations, restatement of the question, citations of sources, etc.

We could ... (4, Funny)

PPH (736903) | more than 5 years ago | (#27746389)

... use the same system the US Patent Office uses for finding prior art.

On second thought, scratch that idea.

detecting it is easy (2, Funny)

Anonymous Coward | more than 5 years ago | (#27746397)

Calculate an md5 hash of the paper, if it matches the md5 of another, it's plagiarized.

Re:detecting it is easy (4, Funny)

fuzzyfuzzyfungus (1223518) | more than 5 years ago | (#27746499)

And that is why I always change the font and margins on papers that I plagiarize...

i don't get why you would do this (2, Interesting)

llamapater (1542875) | more than 5 years ago | (#27746481)

It's a monkeys on a typewriter thing. these companies add papers to there database as they compare them. If you feed enough papers into a database eventually they will all come back plagiarized there are not an infinite number of possible term papers there are only so many things that could be written for a topic that make sense, and most English teachers recycle topics. why English departments buy into this I don't understand let it go for long enough(it would only take another decade or two at most) and you will start getting people who didn't even know they were plagiarizing getting kicked out of college, I'm not talking about improper citations I'm talking about guy in Washington has the same idea as a guy in New York 20 years later. I'm not a lawyer, so i don't know if this is possible, but couldn't they copyright these databases in some form or render them proprietary. If they did that there business model could change to just collecting royalties.

Re:i don't get why you would do this (1)

smith6174 (986645) | more than 5 years ago | (#27746599)

I think you need to have a serious look at the probabilities involved with your monkeys on a typewriter idea. It shows a complete ignorance of modern statistics and datamining to even suggest that "someday" it will be hard to write original work because it will be compared with many others.

Cheating detection is easy (1)

smith6174 (986645) | more than 5 years ago | (#27746507)

Cheating is pretty easy to detect. I have written cheating detection programs and used them successfully. It is actually surprising how well any sort of longest common subsequence comparison will do in spite of any changes students make. It is always up to a human instructor to verify anything if an accusation is to be made. That being said, cheaters usually produce crappy work anyway. I would have to say that at least in computer science courses you need to be quite talented to get past any of the methods I have designed. Usually more talent is required than simply doing the assignment well. I think cheating is just something humans need to give up on. Like chess, checkers, and properly enforced financial fraud; computers have us beat.

Re:Cheating detection is easy (1)

maxume (22995) | more than 5 years ago | (#27746779)

I suppose. I got a warning from a grader for working in the same computer lab as some other people in the same (small) class (collaboration was tolerated for the class, but it needed to be disclosed). The thing was, we weren't comparing work or discussing anything. So there needs to be some sort of balance.

Re:Cheating detection is easy (1)

smith6174 (986645) | more than 5 years ago | (#27746925)

Your experience is quite common. The last thing an instructor should do when an automated system finds similarity is to accuse a student of misconduct. It is far more informative to have someone do similar work in a more controlled environment. The best way I've found to weed out wrongdoers is to give a far worse grade for one of the similar submissions. The student with the bad grade (if they cheated) almost always slips up and complains that the person they copied from got a better score.

Detecting it? (1)

owlnation (858981) | more than 5 years ago | (#27746531)

Hard to detect in an academic paper, but easy to find on the web. Go to almost any Wikipedia article and you'll find it right there in front of you. Especially any article on a movie -- almost are are ripped directly from imdb.

double translated (0)

Anonymous Coward | more than 5 years ago | (#27746551)

"Does your school / university check your homework / theses for plagiarism? Today, probably yes, but they do it right? If little is known about plagiarism detection accuracy, which is why we have a competition on plagiarism detection, sponsored Yahoo! We have an artificial body of plagiarism plagiarism, with varying degrees of concealment and plagiarism translation from Spanish or German source documents. plagiarist was random, trying to disguise his plagiarism with random sequences of text operations such as mixing, deleting, inserting or replacing a word. Translated plagiarism is with machine translation. "

Too many false positives (3, Interesting)

russotto (537200) | more than 5 years ago | (#27746557)

I once was on a Fido forum with someone who would often write responses nearly word-for-word identical to mine. It was uncanny; I'd see his post and recognize my own writing, only to realize it wasn't mine. Timestamps would sometimes show my post was written first, sometimes his. I imagine some others on the forum thought at least one of us was a sock puppet, but neither of us was.

(If he's on slashdot, he's probably composing a post just like this one)

That probably happens rarely. But build a big enough database, and it will happen often. Particularly given the restricted problem domains in undergraduate papers. It's not just a computer problem; even humans will think "plagiarism" when they see two papers with similar ideas and similar turns of phrase. Which I think demonstrates that plagiarism cannot be established satisfactorily merely by showing similarity between papers.

Mike Flores. (0)

Anonymous Coward | more than 5 years ago | (#27746581)

Just ask Mike Flores, he is the world's foremost plagiarism detective.

Require submission of drafts; meet with students (5, Interesting)

cpu_fusion (705735) | more than 5 years ago | (#27746619)

Plagiarism is a symptom of professors only being involved in the last step: reviewing the final product.

Require the students to submit multiple drafts. Meet with them for 15 minutes each and discuss their thought processes on the ongoing paper. You'll get better final products, teach people not to procrastinate, and smoke-out people who have no involvement in their "own work."

What, can't do that because you have 60 students in a class? Well, there's part of the problem too.

We're trying to find a technology solution to a problem with less student-teacher interaction. Typical!

Re:Require submission of drafts; meet with student (2, Insightful)

Colonel Korn (1258968) | more than 5 years ago | (#27746791)

Plagiarism is a symptom of professors only being involved in the last step: reviewing the final product.

Require the students to submit multiple drafts. Meet with them for 15 minutes each and discuss their thought processes on the ongoing paper. You'll get better final products, teach people not to procrastinate, and smoke-out people who have no involvement in their "own work."

What, can't do that because you have 60 students in a class? Well, there's part of the problem too.

We're trying to find a technology solution to a problem with less student-teacher interaction. Typical!

I never taught a class involving humanities paper writing (in the science classes I taught, I could detect borrowed work by asking our kids to explain the calculations in their presentations and reports), but my wife meets with students several at least once after they turn in a required outline and bibliography to her. The bibliography, meeting, and my wife's extensive knowledge of scholarship in her field have made plagiarism rare and very obvious. Also, they make the students write vastly better papers and learn a lot more. Even having students meet with a TA to discuss paper ideas and progress is a huge help, and required outlines, drafts, and (especially) bibliographies should be part of the writing process in every lower level undergrad class. In upper level classes, the meeting is sufficient.

Re:Require submission of drafts; meet with student (0)

Anonymous Coward | more than 5 years ago | (#27746977)

Or do it the old-fashioned way. I did a three year degree at Oxford. Lectures were strictly optional. Tutorials were initially compulsory, but if you could convince your tutor you didn't need them they could be dropped.

But none of this matters, because the degree I received at the end was based on 24 hours of exams. Course work, plagiarized or not, was irrelevant.

The fingerprint model. (2, Insightful)

MarkvW (1037596) | more than 5 years ago | (#27746649)

Law enforcement uses automated fingerprint detection to identify possible matches. It never claims a match based on the computer.

Using a program as the sole plagiarism judge and jury is profoundly unfair. If a university wants to discipline a student for a plagiarism hit, then it needs to obtain the source document--and pay the source document's creator if necessary to obtain it.

Confronting the student with the alleged source gives the student a fair chance to defend himself/herself.

The humanities are in trouble. (5, Interesting)

Areyoukiddingme (1289470) | more than 5 years ago | (#27746667)

Seriously, the humanities are in trouble. With over 6 billion people on the planet, it's extremely difficult to have an original thought. This sets the stage for endless repetition. Add to that the fact that the very process of teaching the humanities usually means imparting a teacher's single interpretation of the source material to the students who then do the natural thing when it comes to writing a paper and parrot back to the teacher what they've heard, knowing that's the only way to get a good grade, and the resulting combination is deadly.

The papers are all going to be similar from the beginning, because it's a rare instructor who actually encourages dissenting opinions (and that fault in teaching is a whole other discussion of its own). Then the papers are going to be similar because there really are only so many ways to interpret the source material that are defensible. And finally, the papers are heavily likely to be similar to at least one other paper written about the subject, when every paper ever written on the subject is considered (exactly what the plagiarism sites attempt to do).

I think the problem this competition is trying to solve is intractable in the face of the current educational system. It's gotten to the point where, if the software considers a large enough number of sources, even the instructor's own papers are going to look like plagiarism.

Hell, look at the Slashdot comment system. A million people read the front page, but only a few thousand post comments. Thousands more are content to simply moderate the comments, and face it, comments they agree with are more likely to be modded up, one way or another. Then compare the modded comments. We get a lot of duplicate or near duplicate thought, and hence near duplicate comments on every article. Why? Because when you get enough people together in one place, discussing the same subject in writing, there are only so many viewpoints and only so many comments that won't get modded down for being of the "cubic what?" variety.

Time to go back to grading on spelling and grammar. We've reached the end of the grading on ideas road. Coherency of presentation is all we have left. (One could argue it's all we ever had.)

Re:The humanities are in trouble. (2, Funny)

Areyoukiddingme (1289470) | more than 5 years ago | (#27746777)

Shit, by the time I came back to the keyboard after writing this post and not hitting submit, there were 30 other posts that said the same thing. I must be a plagiarist.... Damnit.

Uh, Use Google? (2, Interesting)

chainLynx (939076) | more than 5 years ago | (#27746671)

Here's a good article explaining how Google makes plagiarism detection easy: http://questioncopyright.org/node/4 [questioncopyright.org] There was a story a couple years ago about one of these plagiarism detection services, Turnitin, getting sued for copyright infringement... does anyone know if that went anywhere? http://education.zdnet.com/?p=953 [zdnet.com]

Consequences? (1)

ieatcookies (1490517) | more than 5 years ago | (#27746705)

What generally happens once the plagiarism is detected; are these students failed or disciplined?

Is plagiarism enough of a misdemeanour to warrant expulsion? There are many facets to the educational systems but I believe the main priority is to educate. Would a student who could prove proficiency in his studies but is incredibly lazy (I know _many_ people like this for some reason) be eligible for failure or expulsion for turning in a paper that ranked to high for plagiarism testing?

I say, Let them eat plagiarism. (1)

jimbudncl (1263912) | more than 5 years ago | (#27746789)

Seriously, if the teachers don't have the time to identify it and the students are hell bent on doing it... let it happen. Perhaps that's the only way these people will learn anything about the subject matter anyway.

And when they graduate, get a job, and completely fail... that'll be a nice wake up call. Sure, some will succeed (PHBs, anyone?)... but I doubt catching them in school would change the end result much.

Face it, some problems aren't worth the time it takes to solve them, especially when you're approaching them the wrong way from the start.

Queue: whining about how it'll make schools/univeristies look bad when their students fall on their faces in the real world. (I think I'm gonna cry)

This could be useful for blog unduplication (2, Insightful)

Animats (122034) | more than 5 years ago | (#27746823)

This is a useful mechanism for search engines, which need to distinguish original content from hundreds or thousands of blogs echoing it. Imagine the Web with all the duplicate, repetitive material ignored. No wonder Yahoo is supporting this. Someone over there is thinking.

next contest (1)

SABME (524360) | more than 5 years ago | (#27746859)

The next contest will be to see who can write an automated paper generator that fools the plagiarism detector.

Plagiarism vs. Ghostrwriting (3, Interesting)

jcohen (131471) | more than 5 years ago | (#27746967)

I realize that plagiarism detection represents an interesting problem in computer science, and that it goes some distance toweard solving a serious problem. However, I read an article [chronicle.com] in the Chronicle of Higher Education, behind a paywall, alas, which leads me to believe that it is only a partial solution to academic dishonesty. The article suggested that, thanks to the Internet, the costs of human capital are now so low that hiring a ghostwriter to compose one's papers, sidestepping the problem of plagiarism to begin with, is far more expedient than plagiarism itself. It described a Russian-"businessman"-headed network of Filipino paper-writers, most paid between $1 and $3 a page, who are able to market their services to the West through a web site [bestessays.com] and remote call centers. At $20/page to the end-user, with no possibility of plagiarism detection, I think that most desperate students would find this a good deal. In my opinion, ghostwriting will supplant plagiarism as time goes on.

What is a teacher to do? In-class writing samples would seem to be the only hope of detecting ghostwriting. Students could, of course, argue that at home, they can "polish" their papers, and that therefore they will not resemble the in-class samples. Moreover, checking samples against papers is a thankless and time-consuming task which is only a preliminary to actually evaluating the work. Perhaps there is a computer-based solution to this, but, in the meantime, perhaps potential ghostwriting customers could take their desires to their logical conclusion, and simply buy their degrees on the Internet directly.

Load More Comments
Slashdot Login

Need an Account?

Forgot your password?

Submission Text Formatting Tips

We support a small subset of HTML, namely these tags:

  • b
  • i
  • p
  • br
  • a
  • ol
  • ul
  • li
  • dl
  • dt
  • dd
  • em
  • strong
  • tt
  • blockquote
  • div
  • quote
  • ecode

"ecode" can be used for code snippets, for example:

<ecode>    while(1) { do_something(); } </ecode>