How to make a Wikipedian angry

Rates of confirmed plagiarism among articles written by different groups of users, including both blatant plagiarism and subtler close paraphrasing

Adding plagiarism to an article is one of the quickest ways to make a Wikipedian angry. It undermines the integrity of Wikipedia — contributors only have the right to release their own work under our free license — and it takes a lot of work to clean up. And as a community of writers, we take original authorship very seriously.

The Wikipedia Education Program helps professors run Wikipedia assignments, where students improve Wikipedia as part of their class. (Want to get involved as an instructor, or as a volunteer to help classes get started? Get in touch.) And when student editors plagiarize in their Wikipedia contributions, no one is happy.

To try to better understand the problem of plagiarism — across Wikipedia, and among student editors in particular — my team at Wikimedia Foundation recently did a little research project. We identified English Wikipedia articles by student editors in the U.S. and Canada editions of the Wikipedia Education Program, as well as articles by a set of other editors who were statistically similar to the students, new editors from different years, and veteran Wikipedians. Then we worked with a company called TaskUs to put each of the articles through a commercial plagiarism checker. The first results we got showed shockingly high rates of plagiarism for every group. But the majority of these were actually cases of other sites copying Wikipedia, so we went through manually to confirm which ones were actually plagiarism. (It’s amazing where you’ll find the work of Wikipedia editors, across the web and even in print sources.)

We found in the end that for new articles by new users during the years 2006, 2009, and 2012, the rate of confirmed plagiarism was 10–12 percent. For new articles by student editors, it was 5 percent, while for the control group of non-student editors who had similar editing patterns, the confirmed plagiarism rate in their new articles topped 13 percent. We found higher rates of plagiarism in articles expanded by student editors: around 8.5 percent. There is no control group to compare for the expanded articles, but we would expect higher rates of plagiarism in expanded articles in general, since there are fewer barriers to expanding an article on English Wikipedian than for creating a new one. We also looked at plagiarism rates — for new and expanded articles — among the early contributions by admins, as well as the most prolific editors who are not admins. For both of those groups, we found rates around 3 percent — some of which was actually added originally by others, and then built upon by the now-experienced editors.

These numbers aren’t perfect, and there’s still much we don’t know about plagiarism on Wikipedia. (On the research page, you can also check out the details of the project, see the caveats of the methodology, and download the raw data.) For this study, we’re not sure just how much plagiarism slipped through without being detected, nor whether the types of sources plagiarized by student editors were more likely to slip through. But it gives us a basic idea of the prevalence of plagiarism among new editors.

For student editors in particular, because we get the chance to provide more structured guidance and training than with the typical newcomer, we think we can do better. Based on what we found in this plagiarism research, we’ve created a new video for the student training modules that explains what plagiarism is, why it’s bad for Wikipedia, and what happens when editors get caught plagiarizing.

https://upload.wikimedia.org/wikipedia/commons/thumb/3/35/Plagiarism_tutorial.ogv/600px--Plagiarism_tutorial.ogv.jpg&quot; controls="" preload="none" autoplay="" class="kskin" data-durationhint="207.00444444444" data-startoffset="0" data-mwtitle="Plagiarism_tutorial.ogv" data-mwprovider="wikimediacommons"><source src="//upload.wikimedia.org/wikipedia/commons/transcoded/3/35/Plagiarism_tutorial.ogv/Plagiarism_tutorial.ogv.360p.webm" type="video/webm; codecs=&quot;vp8, vorbis&quot;" transcodekey="360p.webm" data-title="Web streamable WebM (360P)" data-shorttitle="WebM 360P" data-width="450" data-height="360" data-bandwidth="92808" data-framerate="24" /><source src="//upload.wikimedia.org/wikipedia/commons/transcoded/3/35/Plagiarism_tutorial.ogv/Plagiarism_tutorial.ogv.480p.webm" type="video/webm; codecs=&quot;vp8, vorbis&quot;" transcodekey="480p.webm" data-title="Web streamable WebM (480P)" data-shorttitle="WebM 480P" data-width="600" data-height="480" data-bandwidth="115312" data-framerate="24" /><source src="//upload.wikimedia.org/wikipedia/commons/transcoded/3/35/Plagiarism_tutorial.ogv/Plagiarism_tutorial.ogv.720p.webm" type="video/webm; codecs=&quot;vp8, vorbis&quot;" transcodekey="720p.webm" data-title="High quality downloadable WebM (720P)" data-shorttitle="WebM 720P" data-width="900" data-height="720" data-bandwidth="126512" data-framerate="24" /><source src="//upload.wikimedia.org/wikipedia/commons/transcoded/3/35/Plagiarism_tutorial.ogv/Plagiarism_tutorial.ogv.480p.ogv" type="video/ogg; codecs=&quot;theora, vorbis&quot;" transcodekey="480p.ogv" data-title="Web streamable Ogg video (480P)" data-shorttitle="Ogg 480P" data-width="600" data-height="480" data-bandwidth="176768" data-framerate="24" /><source src="//upload.wikimedia.org/wikipedia/commons/3/35/Plagiarism_tutorial.ogv" type="video/ogg; codecs=&quot;theora, vorbis&quot;" data-title="Original Ogg file, 1,280 × 1,024 (765 kbps)" data-shorttitle="Ogg source" data-width="1280" data-height="1024" data-bandwidth="765136" data-framerate="24" />Sorry, your browser either has JavaScript disabled or does not have any supported player.<br /><br />
You can <a href="//upload.wikimedia.org/wikipedia/commons/3/35/Plagiarism_tutorial.ogv">download the clip</a> or <a href="//www.mediawiki.org/wiki/Extension:TimedMediaHandler/Client_download">download a player</a> to play the clip in your browser.</video></div>”>File:Plagiarism tutorial.ogv 

The new plagiarism tutorial video

Sage Ross
Online Communications, Wikipedia Education Program

5 Show

5 Comments on How to make a Wikipedian angry

Roger 9 months

Yeah, agreed: it doesn’t matter what the license is, Gego. From an academic standpoint, it’s still plagiarism to copy a text without citing where it came from, even if the text is free for public use. One also shouldn’t quote too much of a text (i.e., beyond fair use percentages), even if it is cited.

In the non-academic world, if a source wants to reproduce a text that’s free for public use, in its entirety…fine, but they can’t claim it as their own.

Kat 9 months

Gego: at least one appellate court in the US believes the use of these documents by the plagiarism checkers is a fair use–and I think this is a decision people who are concerned about copyright overreach should agree with.

I’m not sure which vendors you’re talking about who claim to reserve the rights on submitted documents; I would be interested in seeing examples, but the major one I can think of does not.

There are plenty of arguments against the way many of these services operate and are used (for example, schools’ overreliance on them, or their failure to correctly classify the Wikipedia mirrors), but I think the copyright argument against them is a weak one.

LiAnna Davis 9 months

Chris, I fixed the typo. Thanks for catching it!

Gego 9 months

Plagiarism software itself is the best examples why its a complex problem:

These commercial vendors offer paid services in disregard of the actual document licences of the content they compare eg when a commercial product uses a gpl document

These vendors often reserve the rights on documents submitted which makes them incompatible with gno or some cc licences

So did you check this legal framework beforehand?! And why not asking plagiarism software to either donate or block wikipedia content if zhey base their commercial model on the free work of others?!

Plagiarism is only a problem when combined with copyright…

Chris McKenna 9 months

The sentence “These numbers aren’t perfect, and there’s still much we don’t about plagiarism on Wikipedia.” is missing a word after “don’t”, possibly “know” or “understand”.

Leave a Reply

Your email address will not be published. Required fields are marked *