Wikimedia blog

News from the Wikimedia Foundation and about the Wikimedia movement

Posts Tagged ‘quality’

Kids these days: the quality of new Wikipedia editors over time

The proportion of quality newcomers over time. (2006-2011)

As part of the 2011 Wikimedia Summer of Research, we uncovered a possible correlation between the decline in new active editors that began in 2007 and the rise of warnings issued to new users by bots and automated tools, which started in 2006.

For those of us studying editor trends, the following question has continued to puzzle us: did the change in communications to new users lead to the decline, or can the rise in warnings be explained by a decrease in quality contributions from new users? Perhaps, as some Wikipedians have argued, the new users of today are being reverted and warned more aggressively than those who entered the project in 2001-2006 because their edits are qualitatively worse (e.g., more self-promotional or spammy, less serious and encyclopedic) than those of previous generations of editors.

While the complexity involved in determining what constitutes a “good” contributor to Wikipedia may never allow us to definitively answer this question, our research argues against the theory that today’s newbies just plain suck.

The proportion of rejection for quality newcomers over time.

To test the hypothesis that new contributors who entered the project in recent years have been more harmful and less interested in positively contributing to the encyclopedia, we randomly sampled the first edits of newcomers to the English Wikipedia from the earliest days of the project to the present. With the help of some experienced Wikipedians, we hand-categorized the edits of 2,100 new users according to a four point quality scale – blatant vandal (obscene language, obvious vandalism), bad faith (jokes and nonsense), good faith poor-quality edit (bad formatting, unreferenced, but trying to add value), and golden (good faith good edits that should not be reverted).

What we found was encouraging: the quality of new editors has not substantially changed since 2006. Moreover, both in the early days of Wikipedia and now, the majority of new editors are not out to obviously harm the encyclopedia (~80 percent), and many of them are leaving valuable contributions to the project in their first editing session (~40 percent). However, the rate of rejection of all good-faith new editors’ first contributions has been rising steadily, and, accordingly, retention rates have fallen. What this means is that while just as many productive contributors enter the project today as in 2006, they are entering an environment that is increasingly challenging, critical, and/or hostile to their work. These latter findings have also been confirmed through previous research.

Survival rate of newcomers over time.

This study has many important implications for community and Wikimedia Foundation efforts to engage and retain new editors. To begin, it reasserts the centrality of one fundamental policy on the project, “Assume good faith.” This research strongly supports efforts in the community and at the Foundation to do a better job of integrating new editors into Wikipedia and its sister projects, not simply for the sake of gaining new editors, but for the quality of these new editors’ contributions overall.

At the Foundation level, this includes major software changes like the creation of a visual editor to lower the technical barrier to entry, as well as more experimental pilot projects like template A/B testing, an attempt to make the template messages received by new users more personalized and clear, and the Teahouse, which gives new users a friendly, low-pressure space to seek help from experienced Wikipedians. With better software and an inviting and supportive atmosphere, the encyclopedia can continue to grow both in quality of material and quantity of dedicated contributors.

  • Find out more about this study at Research:Newcomer quality
  • This work is part of a journal article in submission to a special issue of American Behavioral Scientist on Wiki Research
  • A special thanks to R. Stuart Geiger from UC Berkeley, as well as Maryana Pinchuk, Steven Walling, and Oliver Keyes from the Wikimedia Foundation, for their assistance with this study.

Aaron Halfaker,
Wikimedia Foundation Research Analyst and University of Minnesota PhD candidate

Readers in US, Russia, Germany and India are the most pleased with Wikipedia Article Quality

In the recently conducted Wikipedia readers study, we asked respondents to rate the quality of Wikipedia articles on several aspects: trustworthiness, comprehensiveness, neutrality, variety, and ease of understanding. Although we already employ the Article Feedback Toolto assess the quality at an article level, we wanted to understand readers’ perception of quality on Wikipedia as a whole.

I. Individual Measures

II. Quality Perception Index

(more…)

New comparative study to re-examine the quality and accuracy of Wikipedia

Much of Wikipedians’ efforts is devoted to ensuring the quality of the encyclopedia they are producing collaboratively – the community is constantly working to improve it. The effectiveness of this work has been recognized many times, perhaps most notably in a study published in 2005 by the scientific journal Nature which compared entries in the English Wikipedia with those in the online edition of Encyclopaedia Britannica. Nature reported four errors per Wikipedia entry and three per Encyclopaedia Britannica entry, a result that is still widely cited today even though Wikipedia is now more than twice as old, having matured in many ways.

The Wikimedia Foundation has commissioned a new small-scale study to examine the quality and accuracy of Wikipedia articles. This study, currently being undertaken by Epic, a UK-based e-learning company, and Oxford University, employs greater rigor than the Nature study, involves academics and scholars, and will examine more than just English language entries, and subjects other than solely science. Our hope is that the study’s findings will inspire and inform more extensive, independently funded research related to the quality of information found in Wikipedia and other free knowledge projects.

This project will explore methods to define a baseline for the quality of Wikipedia entries and to help the community identify shortcomings, as well as strategies to address them. Wikipedia has several advantages over commercially available online encyclopedias – it is freely accessible to hundreds of millions of users worldwide, it is available in over 270 languages, and it is updated at remarkable speed, relying on the ability of a vast number of non-paid contributors rather than the academic credentials of a few paid experts. However, errors do exist and concerns have been raised that articles may be colored by contributors’ personal opinions or misunderstandings. A comparative analysis of the quality of Wikipedia’s articles and other popular alternatives is crucial to identifying avenues for improvement.

Dario Taraborelli, Senior Research Analyst, Strategy

Tilman Bayer, Movement Communications

Encyclopedia of Life curates Wikipedia’s species articles

There are more than 1.9 million animals, plants, and other forms of life on Earth. In May 2007, some of the world’s leading scientists announced the development of the Encyclopedia of Life (EOL) to document them all. Inspired by biologist E. O. Wilson’s TED Wish and supported by more than $25 million in funding, the project aggregates and makes accessible information about species ranging from 19th century journals to modern online databases.

See the page about Solanum lycopersicum, the garden tomato, as an example. Much of the information comes from Solanaceae Source, a specialized source of  names lists, species descriptions, specimen collections and publication lists for the genus Solanum. The Biodiversity Heritage Library provides historical public domain texts about the species from various published journals. Many other specialized and general resources contribute to the overall species page.

A Wikipedia article included in an Encyclopedia of Life species page. The yellow background indicates that no curator has reviewed the content yet. Click the image to enlarge.

You’ll also find a “Wikipedia” entry in the table of contents. It reveals a copy of the Wikipedia article about tomatoes. As of this writing, the article text has a yellow background.

This means that an Encyclopedia of Life curator has not yet reviewed the content for inclusion in EOL. An EOL species page can have one or more curators who select and validate information added to EOL pages. Wikipedia articles, where they exist, are included by default.

Once the article has been validated by a curator, the yellow background is removed. The information for curators and curation standards pages on EOL give some additional background on the curation process, which applies to all content objects in EOL. Specific guidelines have been written for curation of content from Wikipedia and Wikimedia Commons. We’re particularly pleased that EOL encourages its curators to improve Wikipedia directly if errors or omissions are found.

So far, more than 200 Wikipedia articles have been reviewed through this process. Reviewers classify the information as follows:

  • ‘trusted’ – reviewed by curator and not deemed to contain substantially incorrect information
  • ‘untrusted’ – reviewed by curator and deemed to include incorrect or unverifiable information
  • ‘inappropriate’ – reviewed by curator and deemed to not be eligible for inclusion in EOL for other reasons (e.g. too short to add value)

EOL makes the entirety of all review information (who reviewed what when, with what outcome) available through an Atom feed. This means that Wikipedians, and others, can use this information easily in the development of new applications.

The book creator tool makes it possible to order a printed and bound book from any Wikipedia article selection. A custom cover can be chosen. Nautilus photograph by Lee Berger, Creative Commons Attribution/Share-Alike License. (Click to enlarge.)

A proof-of-concept for expert reviews

Magnus Manske is a biochemist and programmer at the Sanger Institute in the United Kingdom. He is also a long-time Wikimedia volunteer, and wrote the first version of the PHP software used by Wikipedia, which later became MediaWiki. As a scientist, Magnus has advocated for the scientific community to use and improve Wikipedia, most recently as co-author of the paper Ten Simple Rules for Editing Wikipedia.

I informed Magnus about the new EOL review information, and suggested that we might want to explore using this information to generated printed books or PDF collections of reviewed articles. The software for exporting Wikipedia articles into books already exists, so it was just a matter of putting two and two together.

So, Magnus used the available data feed to create an automated tool that creates a list of all EOL-reviewed article versions in a form that can be used by Wikipedia’s book tool.

This makes it possible to download a PDF file or order a printed book that only contains EOL-reviewed versions of Wikipedia species articles.

To try it out, visit the page for Magnus’ example book. Click “Download PDF” to generate the (very large) PDF file that contains all the species articles, or “order printed book” to preview or order a printed book from PediaPress (which, as of this month, also offers books in color and hardcover format). If you want to remix or play with the book further, you can click “Open book creator”.

We’re very pleased with this first proof-of-concept, and are grateful to the Encyclopedia of Life team for engaging its community in the curation of Wikipedia articles. Both parties benefit: The Encyclopedia of Life enriches its species pages using the often well-developed Wikipedia content. Wikipedia benefits because EOL’s trusted reviewers add their stamp of approval to Wikipedia articles, which helps Wikipedia readers and editors alike. Where EOL reviewers do not approve, they are encouraged to edit the Wikipedia article.
I asked Bob Corrigan, EOL Product Manager and Acting Deputy Director, to give his take on this project. He writes: “This is definitely a win-win partnership. EOL is focused on providing very deep, structured access to trusted biodiversity information from our network of content partners and curators, and vetted Wikipedia articles can be a terrific gateway to this information. We see a closer relationship with Wikimedia as an important way to expand access to global knowledge about life on Earth.”

Hardcover book made from curated Wikipedia articles. Photo credit: Guillaume Paumier; Nautilus photograph by Lee Berger. Creative Commons Attribution/Share-Alike License 3.0

Example page from the book. Photo credit: Guillaume Paumier; Nautilus photograph by Lee Berger. Creative Commons Attribution/Share-Alike License 3.0

A replicable model

Magnus’ implementation was already created with an eye to future extensibility. If you’re inclined to take a closer technical look, check out Magnus’ “Sifter-Books” script which generates the book data, and can potentially support multiple partner institutions/organizations providing article reviews. As of the time of this writing, Magnus has already added two additional groups who review Wikipedia articles, Rfam and Pfam, databases of RNA and protein families.

Moreover, Magnus has written a small proof-of–concept script which makes the existence of reviews visible on Wikipedia itself. You need to create a user account on the English Wikipedia and follow the installation instructions to use the script. Once installed, a “Reviews” tab will indicate available article reviews.

We look forward to exploring similar partnerships with subject-matter experts in institutions (like universities and libraries), scientific associations, and specialized knowledge communities. If you’re interested in this model, drop me a note (erik at wikimedia dot org).

Erik Moeller
Deputy Director, Wikimedia Foundation
Representative of Wikimedia in the Encyclopedia of Life Institutional Council

Article feedback pilot goes live

As recently announced on the tech blog and in the Signpost, we’re launching an experimental new tool today to capture article feedback from readers as part of the Public Policy Initiative. We’re also inviting the user community to help determine its future by joining a workgroup tasked with evaluating it.

The “Article Feedback Tool” allows any reader to quickly and easily assess the sourcing, completeness, neutrality, and readability of a Wikipedia article on a five-point scale. It will be one of several tools used by the Public Policy Initiative to assess the quality of articles. We also hope it will be a way to increase reader engagement by seeking feedback from them on how they view the article, and where it needs improvement.

The tool is currently enabled on about 400 articles related to US public policy. You can see it in action at the bottom of articles such as United States Constitution, Don’t ask, don’t tell or Brown v. Board of Education.

Another goal of this pilot is to try and find a way to collaborate with the community to build tools and features. As main users of the software, Wikimedians are in a unique position to evaluate how a feature performs, and what its strengths and limitations are. The Article Feedback Tool is still very much in a prototype state; we’re hoping the user community can help us determine whether resources should be allocated to improve it (and if so, how), or if it doesn’t meet the users’ needs and should be shelved or completely rethought.

More information about the tool is available on our Questions & Answers page.

If you want to try the tool to assess an article, pick a subject you’re familiar with from the full list and rate it! If you’d like to participate in the evaluation of the tool itself and what becomes of it, please join the workgroup. If you’re interested in article assessment in general, please also join the Public Policy Initiative’s Assessment Team.

Thank you,

Guillaume Paumier,
on behalf of the Features Engineering team

A quick update on Flagged Revisions

One of the wonderful characteristics of Wikimedia’s wikis, including Wikipedia, is that every change ever made to a page is recorded, back to the very first version (compare, for example, the first version of the article about chess with the most recent version of the same article). This characteristic also makes it possible to assign quality assessments to specific versions, thereby giving our readers greater transparency about the perceived current or past quality of an article.

A very powerful software feature called Flagged Revisions makes it possible to systematize such quality assessments.  It’s been in production use in many of our wikis for more than a year now, including the second-largest Wikipedia, the German language edition. Fundamentally it’s a very flexible feature, and different project communities (the German Wikipedia, the English Wikibooks, etc.) can come up with configurations that suit their needs. By means of our public issue tracker, they can then request from the Wikimedia Foundation that such configurations be turned on.

Even though we’ve made no official announcements about this, you may have seen media reports that Flagged Revisions will soon be enabled in the English Wikipedia. Indeed, there is a specific proposal that was developed by the English Wikipedia community, entitled Flagged protection and patrolled revisions. It’s a very thoughtful proposal that attempts to balance the desire for higher quality, and more systematic assessment thereof, with the immediacy of Wikipedia as it exists today, and was supported by a large majority of interested Wikipedia editors. The idea behind this proposal is to allow regular contributors to systematize a first, basic assessment of all edits by new contributors. However, this assessment will be purely for informational purposes to the reader: a reader will see whether or not the version of an article they look at has been patrolled, and if not, whether a prior patrolled version is available.

Only in a small percentage of cases, we would require changes to be patrolled before becoming the default view for readers. The proposal is to do so initially in the case of articles at high risk of vandalism, including high risk biographies of living people, where false information can do the most serious harm to an individual.

A popular media narrative of this proposal (in the cases where it has been reported roughly correctly to begin with) is that it represents a “clamping down” on Wikipedia’s open editing process. That is nonsense. It is presently the case that many high-risk articles are completely uneditable by new contributors, which is referred to as page protection. For example, as a completely new user, you are not able to alter the article about Barack Obama. These kinds of protections of high-risk articles have been common for many years now. If the proposed model works as intended, it will actually allow us to open up many articles for editing which are currently protected from being edited. Edits will have to be patrolled, which is clearly a step up from edits not being possible at all.

It is true that some implementations of Flagged Revisions are more conservative than that. Any edit in the German Wikipedia by a new or unregistered user has to be patrolled before becoming visible to readers. This is definitely not the case in the proposed English Wikipedia configuration. We believe in letting our communities experiment with different approaches in an attempt to find the right balance.

A test wiki for the English Wikipedia configuration has just been set up in the Wikimedia Labs, and we’ll be importing articles from Wikipedia soon and make a broad call for testing. It’s important for us to get this right – we want to make sure that we don’t make Wikipedia harder to use, for our readers or our editors, in the process of deploying this functionality. That said, we hope to be able to deploy Flagged Revisions in production use on the English Wikipedia within 2-3 months.

From Wikimania in lovely Buenos Aires,
Erik Moeller
Deputy Director, Wikimedia Foundation

[UPDATE 8/26] This post originally said that all biographies of living people would be “flagged protected”. This is not correct. The current proposal is for for articles that are currently under normal mechanisms of protection (where new and unregistered users cannot edit) to be eligible for the new protection model, which allows for more open editing. I apologize for the confusion; thanks to Sage Ross for the quick correction.

Quality Assurance in an Open Project

Wikipedia was founded on radically open collaboration. Pick any article you know something about, and the “edit this page” link at the top allows you to make an instant change.

Edit this page link image

By editing a Wikipedia article, you get instant access to the “guts” of the page. Whether you’re just changing some text, adding a reference, or inserting an image: Wikipedia is open to new contributions at any time.

Instead of moderating edits when they are made, the wiki model has always been to systematically review changes as they come in:

  • by storing every version of every article ever created;
  • by allowing anyone to restore prior versions;
  • by providing numerous tools for experienced editors to review and patrol changes.

This gives writers the instant gratification to see their changes published, while – hopefully – leading to high quality articles over time as more and more people review and improve a page.

In addition to the constant mutual peer review, there are countless Wikipedia processes used to identify articles of the highest quality, articles with various problems, or articles that should be deleted. (The Wikipedia Signpost, a community newsletter, has just published an interesting history of the featured article candidacy process.)

New processes and technologies for quality assurance are developed and tested all the time. But few are as long-awaited and potentially game changing as FlaggedRevs.

The FlaggedRevs Extension

The German Wikipedia is currently trialing a new extension (what’s an extension?) to our software, called “FlaggedRevs“. The extension, which has been under development for more than a year, is a very powerful set of tools for reviewing, labeling and selecting changes made in a wiki. We believe that FlaggedRevs represents a milestone in the development of wiki technology. To our knowledge, there is no other tool available today that provides comparable functionality.

So what, exactly, does it do?

In a nutshell, FlaggedRevs (short for “flagged revisions”) can be used to give a defined group of authors the ability to attach quality labels (flags) to individual versions (revisions) of articles. It can also be used to determine which version of an article should be shown to a reader visiting the wiki: the most recent one, or the highest quality version available?

These two features are not necessarily linked. In the most basic use scenario imaginable, FlaggedRevs can simply be used to patrol a wiki for malicious changes (“vandalism“). When a change has been found not to be malicious, a trusted user can label it as such. This has two key advantages compared to the current patrolling model:

  • It reduces duplicate effort in basic change patrolling, allowing users to focus on un-reviewed changes and thereby directing their attention more effectively.
  • It ensures higher coverage of changes. In particular, when malicious changes are followed by good faith edits, malicious changes are sometimes overlooked. In the FlaggedRevs model, reviewers can systematically examine every change.

In addition, both human and non-human readers can select “known good” versions of Wikipedia articles which do not include malicious changes. Whether you’re a teacher printing Wikipedia articles for the classroom, a student using them for research, or a publisher creating a DVD copy, you can pick the articles which have been checked for basic vandalism by trusted editors, instead of simply choosing the most recent version.

As a user of the German Wikipedia, you will notice that some articles have the following icon in the top right corner:

FlaggedRevs Icon 1

This icon indicates that the version you are looking at hasn’t been checked for vandalism yet. (If an older version that has been checked is available, this is indicated below the icon.)

The End of Immediacy?

While this configuration is simple enough, it should be noted that until about a couple of weeks ago, the German Wikipedia was using a different setup in which any change by a user without the permission to review changes for vandalism (which includes all unregistered users and relatively new ones) had to be reviewed before becoming the default version shown to readers. In other words, if you were not in the group with permission to review edits, your own changes did not become the “live version” until someone else looked at them.

This was a controversial change, as some users felt it significantly reduced the incentive for new contributors to start editing Wikipedia. So far, there has been limited analysis of the data collected during this experiment, which lasted from May until July 2008, and we hope to analyze the effects in greater detail over the coming weeks. (Some real-time statistics are available, thanks to André Karwath.)

Should changes to Wikipedia by new and unregistered users be reviewed before becoming the default shown to readers? There might be a middle ground solution: On most articles, changes would continue to be applied immediately, under the assumption that the benefit of radically open collaboration is greater than the risk. But, on a subset of pages, changes by unregistered and new users would have to be reviewed before becoming visible. This subset could consist of articles which are frequently the target of vandalism, such as the biography of the US President, but it could also include those pages which have reached a very high standard of quality as determined by the Wikipedia community. In other words, when the drawbacks of radical openness outweigh the risks, editing would be throttled.

This would, in fact, represent an opening up of Wikipedia rather than a closing down, as many of the affected pages are currently “semi-protected”, meaning that they cannot be edited at all by new and unregistered users due to the perceived risks of malicious edits. Being able to make changes that do not immediately become visible is surely preferable to not being able to make changes at all.

What’s next?

The Wikimedia Foundation has authorized all Wikimedia project communities to conduct experiments with FlaggedRevs through a process of self-organization. The process by which a Wikimedia community (e.g. the French Wikipedia, the Russian Wikibooks, etc.) can request the FlaggedRevs extension to be enabled is open and transparent. As the process unfolds, we will try to support the communities by collecting data about the use of the extension. Depending on our findings, we may eventually make a simple configuration of FlaggedRevs the default for all wikis.

There are other potential future uses of FlaggedRevs:

  • Use for identification of article versions which meet standards of accuracy and quality as determined by experts. Potentially, FlaggedRevs could interface with external expert communities (such as universities or expert-driven encyclopedia projects like the Encyclopedia of Life) to identify versions of Wikipedia articles which meet scholarly standards of quality.
  • Use for identification of article versions which meet internally defined standards of quality beyond the simple check for vandalism. The original German Wikipedia proposal for FlaggedRevs includes a more in-depth community quality review stage, which is still being discussed. A simple way to tie into community review mechanisms would be to use FlaggedRevs to “tag” versions which have passed through processes like “Featured Article Candidates“.
  • Use to collect basic reader feedback on articles. Asking our readers whether information in Wikipedia articles is useful to them, and whether it meets their quality standards, could be a good way to track reader satisfaction over time. The lead developer of the FlaggedRevs extension, Aaron Schulz, is currently implementing such reader feedback tools.

The development of this technology represents the commitment of the international Wikimedia community to achieving the highest possible standards of quality in all our projects. In particular, the German Wikipedia community and the German chapter have been leaders and pioneers in this process. Philipp Birken from the German chapter gave a compelling presentation at the recent Wikimania on this very topic.

We welcome your feedback in making this technology more useful. An English demo version is set up in the Wikimedia Labs.

Erik Möller, Deputy Director