Seven years after Nature, pilot study compares Wikipedia favorably to other encyclopedias in three languages

This post is available in 3 languages: English العربيةEspañol

This post is available in 3 languages

Improving the quality of articles has long been one of the primary aims of contributors to Wikipedia, and is one of the Wikimedia movement’s 2010-15 strategic priorities, but measuring it objectively has remained a challenge. In 2005, Nature famously reported that Wikipedia articles on scientific topics contained just four errors per article on average, compared to three errors per article in the online edition of Encyclopaedia Britannica. Britannica objected to the report, but Nature stood by it, and the report remains widely cited today.

Since that time, however, there have been relatively few independent analyses of Wikipedia article quality, despite the enormous growth of the project. Wikipedia today counts more than 23 million articles across languages (more than 4 million articles in the English Wikipedia alone) compared to 3.7 million total articles in 2005; today it ranks 6th by overall traffic according to Alexa, while it ranked 37th in 2005.

With increase in size and reach, how has quality evolved? How does Wikipedia compare today to other online encyclopedias, quality-wise? And what are good methods to measure the quality of encyclopedic articles?

The Wikimedia Foundation is announcing the release of a pilot study conducted by Epic, an e-learning consultancy, in partnership with Oxford University – “Assessing the Accuracy and Quality of Wikipedia Entries Compared to Popular Online Alternative Encyclopaedias: A Preliminary Comparative Study Across Disciplines in English, Spanish and Arabic.”

The study compared a sample of English Wikipedia articles to equivalent articles in Encyclopaedia Britannica, Spanish Wikipedia to Enciclonet, and Arabic Wikipedia to Mawsoah and Arab Encyclopaedia. 22 articles in the sample were blind-assessed by 2 to 3 native speaking academic experts each, both quantitatively and qualitatively.

The small size of the sample does not allow us to generalize the results to Wikipedia as a whole. However, as a pilot primarily focused on methodology, the study offers new insights into the design of a protocol for expert assessment of encyclopedic contents. For our editor community and for the Foundation, which commissioned the study in 2011, it also offers evidence to inform the design of quality assessment mechanisms and quality metrics that may be used on Wikipedia itself.

The results suggest that Wikipedia articles in this sample scored higher altogether in each of the three languages, and fared particularly well in categories of accuracy and references. As the report notes, the English Wikipedia fared well in this sample against Encyclopaedia Britannica in terms of accuracy, references and overall judgement, with little differences between the two on style and overall quality score. Similar results were found when comparing Wikipedia articles in Spanish to Enciclonet. In Arabic, Mawsoah and Arab Encyclopaedia articles scored higher on style than Wikipedia, but no significant differences were found on accuracy, references, overall judgment and overall quality score. None of the encyclopedias considered in this study were rated highly by the academics in terms of suitability for citation in academic publications.

We hope that the results of this study will encourage further independent research on the quality of Wikipedia articles. To this end, Epic and Oxford University are releasing the full version of the report of this study under a Creative Commons Attribution-Share Alike license. They have announced the report here and have released an anonymized dataset under a Creative Commons Zero dedication. The team welcomes comments and feedback on the talk page of the project.

We are very encouraged by the results for this small sample of Wikipedia articles in three languages. While pointing the way forward for further research, these results affirm the quality of the collaborative work of our editor community.

 
Dario Taraborelli, Senior Research Analyst

Categories: Research
Tags:
Categories:

Tags:
9 Show

9 Comments on Seven years after Nature, pilot study compares Wikipedia favorably to other encyclopedias in three languages

M-E Duban 8 months

This report, and in general all reporting on these comparisons, suffer in my view from having ignored the obvious discrepancies in validity expected from a study based on sampling, versus ones designed based on the actual manner in which Wikipedia is used. Any sample-basd approach, random or otherwise, ignores the fact that users do not come to an Encyclopedia and evaluate a random selection of content, rather, they come for information on one or a few subjects, often times embedded within larger articles, and so require uniform source quality and uniform “intra-source” navigation; they come for expertise on a very specific subject of interest, as located in relevant articles accessed by search and via relationships communicated by internal links. Hence it matters little how Wikipedia performs in a sampling study, much more how it would perform in a systematic, thorough comparison of “critical content” (however this might be defined in a research content), and then on how flawless the interconnections are that lead the reader between sections of related content on the subject of interest. Were Wikipedia and EB content to be assessed, taking into account how the encyclopedias are actually approached, WP would fare far less well, I argue, than the foregoing reporting suggests. A simple example will illustrate the point. Readers with access should compare the section(s) on the “total and partial synthesis of steroids”, in EB and in Wikipedia. This subject is clearly within the scope of each encyclopedia, but clearly not among the small sample used in the earlier evaluations/comparisons of the sites. Had it been, the differences in content quality would not have come down to single digit statistical differences, for the EB site’s material is present and well developed, while at the WP site is essentially nonexistent. With regard to this, and many, many further real examples, the EB vs WP comparisons literally miss the forest for the trees, because the sample size of the earlier is far too small to give the study the power it needs to draw far-reaching, general conclusions. Now, while there would be some need for independent assessment on what constitutes such “critical content” for comparison, there is no gainsaying that (i) it can be defined, (ii) however is would be defined, in unbiased fashion, the set size would be far larger than the numbers in the samples of the foregoing comparison studies, and (iii) as well, though possibly a more controversial a conclusion, that vis-a-vis uniformity and consistency of high quality, the EB product would far, far surpass that of Wikipedia (for the large sample). (This can be argued based on the example provided, considering a high limit of sample size: if we were to define a parent set as being the union of all articles contained by both encyclopedias, and focus on the subset of articles where both encyclopedias provide coverage, the largest “sample” possible is all articles in this intersecting subset. The gross quality difference between the two articles in the foregoing example would not have been missed in this extremely large “sample”. It is a matter for a future research study design, to determine how large one would need to make the sample (how much smaller than all articles in common to the two encyclopedias), in order to not miss the inaccuracies represented by the steroid subsection example.) Finally, I would note my opinion that our acknowledging and broadcasting such shortsighted and self-serving approbations of the WP as are represented by the comparative studies to date, rather than soliciting accurate, improvement-inducing and broadly valid analyses of our content—this will not move us forward in content quality, and is no feather in our collective intellectual cap. (so says a semi-retired professor)

Kragen Javier Sitaker 2 years

Bob, I can see how you might think that about Wikipedia, but I don’t think what you’re seeing are actually coverage *gaps* but rather the *much deeper coverage* that Wikipedia gives to certain topics than Britannica. The biggest difference between the Wikipedias and encyclopedias like Britannica — which didn’t show up in this study because it was restricted to areas where “articles from different online encyclopaedias were of comparable substance and focus” — is they are much, much, much larger than things like Britannica.

Wikipedias are much, much larger, and much more comprehensive. As your article evocatively points out, Britannica in print form occupies about two bookshelves, in about 32 volumes. A hypothetical printed version of the English Wikipedia was estimated at about 1700 volumes in 2010: http://en.wikipedia.org/wiki/Wikipedia:Size_in_volumes

So if we’re talking about gaps in coverage, Britannica unavoidably suffers enormous gaps in coverage relative to English Wikipedia, simply because English Wikipedia contains some 50 pages of material for every page in Britannica. As an example, the first example that came to my mind, compare http://www.britannica.com/EBchecked/topic/588941/tetrachloroethylene (126 words, of which 100 are available to anyone) with https://en.wikipedia.org/wiki/Tetrachloroethylene (900 words plus 13 references, two diagrams, and a table of chemical property data). They’re not even in the same league. The fact that the Britannica article appears to have not been updated in several decades, and so omits the extremely important information that tetrachloroethylene is a probable carcinogen, is comparatively insignificant!

Now of course we can’t expect English Wikipedia to cover chemical compounds to the same depth that it covers, say, Pokémon: https://en.wikipedia.org/wiki/Zoroark — I think that’s what you’re saying about “the typically parochial interests of its contributors”. But it still covers chemical compounds, and everything else I’ve compared, in dramatically greater depth than Britannica does.

NaBUru38 2 years

I’ve read some fifty pages of the study. One of the alleged weaknesses of Wikipedia, lack of definition of terms, has an easy explanation: Wikipedia relies on users clicking links to get more information. For example, the article Ecology doesn’t really need to define organism, environment, biomass, ecosystem, species, community and biodiversity, because their definitions are a click away. The study treats articles as standalone works, but Wikipedia doesn’t work that way.

NaBUru38 2 years

“Enciclonet was selected because of its high popularity, its high Alexa traffic rank of 322,628″ – That number is high indeed, but in a bad sense.

Pete Forsyth 2 years

And, I’m finding errors — nice case in point why using a wiki for things like this can be beneficial. (The first entry in the Table of Contents has the wrong page number; and the first citation in the paper appears to be absent.)

Pete Forsyth 2 years

Thanks for this excellent news.

I have begun transcribing this freely licensed publication on Wikisource:

http://en.wikisource.org/wiki/Index:EPIC_Oxford_report.pdf

…and would welcome any help! The goal is to make a more accessible HTML version of the study, to supplement the PDF. If anybody would like some pointers on how to get started with this, contact me at http://en.wikisource.org/wiki/User_talk:Peteforsyth

Filceolaire 2 years

Bob

“First, coverage of topics within a broad subject area is determined by self-selecting submitters, so there are often coverage gaps” Citation needed Bob. My impression is that coverage gaps on English Wikipedia have been filled over time.

“Second, there are often inconsistencies from one article to the next within a broad subject area. Although individual articles may be accurate with respect to their topic, one cannot rely on the collection of articles for a broad subject area as consistent or comprehensive.” Citation needed again Bob. Do you have evidence for this or is it just your impression? My impression is that when the editors finish on one topic in a broad subject area they move on to work on related topics and by now most of those articles in English Wikipedia have been improved.

“An actively edited reference like EB is much less likely to be deficient this way.” I think you will find that wikipedia is pretty actively edited Bob. Maybe you meant to call EB “actively managed”. I agree that EB, with it’s more limited resources, needs to choose carefully where to apply those resources.

I think your comments above do still apply on some of the other language wikipedias – Welsh wikipedia still needs work though it is already the largest Welsh language site on the web.

Tilman Bayer 2 years

Thank you for the link. Opinions of the form “Encyclopedia X is produced by method Y, therefore its content must exhibit property Z” are very frequent in discussions comparing Wikipedia with other encyclopedias. I think one of the values of the present study is that it adheres to the established principle of blind review, and developed methods (described in section 3.4.2) to ensure that reviewers’ judgments are not biased by such preconceived opinions about each encyclopedia’s production process.

I agree that besides factual accuracy, comprehensiveness/evenness of coverage (“not what was there, but what was not”, as you wrote in your article) is important, and I’d love to see objective methodology developed to compare it across an encyclopedia’s entire coverage of a particular subject area. But it’s worth noting that in their judgment of single articles, reviewers in this study already took this into account, see e.g. this example on p.43:

‘all three reviewers for the article on Memory felt that, despite “minor flaws” (Reviewer 2), the Wikipedia article was superior especially with respect to its coverage of the topic: “The first article [Wikipedia] is decent. It is reasonably concise, and covers most things that I would include – certainly it is not perfect, and there are things missing, but it is concise and well-written. By contrast the second article [Britannica] is very vague and makes minimal links to the actual original science behind the points […] I actually think that it would be a little misleading to a novice, because the literature has developed so much in the last 10-15 years.” (Reviewer 3 – doctoral student)’

Bob Binder 2 years

This study ignores two significant deficiencies of Wikipedia as it compares with references like Encyclopedia Britannica. First, coverage of topics within a broad subject area is determined by self-selecting submitters, so there are often coverage gaps. Second, there are often inconsistencies from one article to the next within a broad subject area. Although individual articles may be accurate with respect to their topic, one cannot rely on the collection of articles for a broad subject area as consistent or comprehensive. An actively edited reference like EB is much less likely to be deficient this way.

More about this at
http://www.robertvbinder.com/britannica-brat/

Leave a Reply

Your email address will not be published. Required fields are marked *