Every year, the Wikimedia Foundationâs Community Tech team invites the most active Wikimedia contributors to participate in the community wishlist surveyâproposing, discussing and voting on the features and improvements that theyâd most like to see. When the votes are counted, Community Tech is responsible for addressing the top ten requests on the list.
The #9 wish this year was to improve the plagiarism detection bot, which was created by volunteer developer Eran, and has been running on English Wikipedia since January 2015. The bot is a clever solution to a tricky problemâidentifying text which is copy-pasted from other websites. EranBot looks at every edit which adds a significant amount of text to a Wikipedia article, and compares it against a search database for potential matches. When the database finds a possible match, EranBot flags the edit for human review.
The original interface for EranBotâs reports was difficult to use. The reports were published on a wiki page using a series of complicated templates, and users had to click through to other sites to see the text comparisons. Worst of all, to start using the tool, a user had to add some code to their personal common.js page, an annoying hassle that prevented people from trying it out.
When the Community Tech team started to work on this wishlist item in April 2016, there was only one dedicated volunteer using the tool: Diannaa, a longtime admin and copy editor, who reviewed and resolved hundreds of copy-patrol cases every week. There was a growing backlog of several thousand unreviewed cases, waiting to be checked.
Working with Eran and Diannaa, the Community Tech team built CopyPatrol, a new interface that makes reviewing cases easier and faster. On CopyPatrol, a user can compare the Wikipedia articleâs text with the suspected source text directly on the page by clicking the Compare button, which opens a side-by-side comparison. There are links for all of the information that the patroller needs to resolve the caseâthe editorâs name, talk page and contribution history, the suspected edit and the articleâs history.
When thereâs confirmation that the text was copy-pasted from another source without attribution, the patroller needs to revert the edit and leave a talk page message for the editor, explaining the wikiâs guidelines about plagiarism and copyright violation. Once thatâs done, the patroller marks the case as âPage fixedâ.
Sometimes, the bot finds a false positive, flagging text that was properly cited in the article, or matching text from a Wikipedia mirror site. In that case, the patroller marks it as âNo action neededâ.
The Community Tech teamâs goal for this project was to build an interface that would attract and retain more patrollers, so that Diannaa didnât have to work on this alone. Sheâs still the most active patroller, but now sheâs backed up by a team of six regular patrollers.
As Diannaa says, this work âis something we really need to do in order to be taken seriously as a world-class website and resource. The job is actually two-fold: clearing out copyright violations, and educating people as to our copyright rules. Many people, accustomed to the ways of Facebook and LinkedIn, don’t even realise that we don’t accept copyright content. It’s great that the bot picks up on so many copy vios and we get the opportunity to do this teaching right away, before the user has made hundreds or thousands of copyright violations.â
In the four months since CopyPatrol was launched in July, more than 9,000 articles have been reviewed. Nearly 5,000 of them were found to be copyright violations and were fixed by patrollers. Thanks to all the new patrollers, there isnât a growing backlog anymore, and new cases are reviewed within twenty-four hours. The Community Tech team is now working on expanding the tool to other languages, so that volunteers can review copy-paste cases on French Wikipedia and others.
CopyPatrol is a great example of contributors, volunteer developers and Foundation staff working together to improve the quality of the Wikimedia projects. The next Community Wishlist Survey opens todayâ help us choose more projects to work on in 2017!
Danny Horn, Senior Product Manager, Community Tech
Wikimedia Foundation
Can you help us translate this article?
In order for this article to reach as many people as possible we would like your help. Can you translate this article to get the message out?
Start translation