Wikimedia blog

News from the Wikimedia Foundation and about the Wikimedia movement

Seven years after Nature, pilot study compares Wikipedia favorably to other encyclopedias in three languages

This post is available in 3 languages: العربية 100% • Español 7% • English 100%

Improving the quality of articles has long been one of the primary aims of contributors to Wikipedia, and is one of the Wikimedia movement’s 2010-15 strategic priorities, but measuring it objectively has remained a challenge. In 2005, Nature famously reported that Wikipedia articles on scientific topics contained just four errors per article on average, compared to three errors per article in the online edition of Encyclopaedia Britannica. Britannica objected to the report, but Nature stood by it, and the report remains widely cited today.

Since that time, however, there have been relatively few independent analyses of Wikipedia article quality, despite the enormous growth of the project. Wikipedia today counts more than 23 million articles across languages (more than 4 million articles in the English Wikipedia alone) compared to 3.7 million total articles in 2005; today it ranks 6th by overall traffic according to Alexa, while it ranked 37th in 2005.

With increase in size and reach, how has quality evolved? How does Wikipedia compare today to other online encyclopedias, quality-wise? And what are good methods to measure the quality of encyclopedic articles?

The Wikimedia Foundation is announcing the release of a pilot study conducted by Epic, an e-learning consultancy, in partnership with Oxford University – “Assessing the Accuracy and Quality of Wikipedia Entries Compared to Popular Online Alternative Encyclopaedias: A Preliminary Comparative Study Across Disciplines in English, Spanish and Arabic.”

The study compared a sample of English Wikipedia articles to equivalent articles in Encyclopaedia Britannica, Spanish Wikipedia to Enciclonet, and Arabic Wikipedia to Mawsoah and Arab Encyclopaedia. 22 articles in the sample were blind-assessed by 2 to 3 native speaking academic experts each, both quantitatively and qualitatively.

The small size of the sample does not allow us to generalize the results to Wikipedia as a whole. However, as a pilot primarily focused on methodology, the study offers new insights into the design of a protocol for expert assessment of encyclopedic contents. For our editor community and for the Foundation, which commissioned the study in 2011, it also offers evidence to inform the design of quality assessment mechanisms and quality metrics that may be used on Wikipedia itself.

The results suggest that Wikipedia articles in this sample scored higher altogether in each of the three languages, and fared particularly well in categories of accuracy and references. As the report notes, the English Wikipedia fared well in this sample against Encyclopaedia Britannica in terms of accuracy, references and overall judgement, with little differences between the two on style and overall quality score. Similar results were found when comparing Wikipedia articles in Spanish to Enciclonet. In Arabic, Mawsoah and Arab Encyclopaedia articles scored higher on style than Wikipedia, but no significant differences were found on accuracy, references, overall judgment and overall quality score. None of the encyclopedias considered in this study were rated highly by the academics in terms of suitability for citation in academic publications.

We hope that the results of this study will encourage further independent research on the quality of Wikipedia articles. To this end, Epic and Oxford University are releasing the full version of the report of this study under a Creative Commons Attribution-Share Alike license. They have announced the report here and have released an anonymized dataset under a Creative Commons Zero dedication. The team welcomes comments and feedback on the talk page of the project.

We are very encouraged by the results for this small sample of Wikipedia articles in three languages. While pointing the way forward for further research, these results affirm the quality of the collaborative work of our editor community.

Dario Taraborelli, Senior Research Analyst

 

Siete años tras “Nature”, estudio piloto compara favorablemente a Wikipedia frente a otras enciclopedias en tres diferentes lenguas.

Hace tiempo que mejorar la calidad de los artículos es uno de los principales objetivos de los editores de Wikipedia. Es además una de las prioridades estratégicas del movimiento Wikimedia para 2010-2015, pero la capacidad de medir objetivamente este aspecto continúa siendo un desafío. En 2005, una famosa publicación de la revista “Nature” encontró que Wikipedia contenía un promedio de sólo cuatro errores por artículo sobre temas científicos contra los tres por artículo de la edición en línea de la Enciclopedia Británica. Enciclopedia Británica cuestionó el trabajo pero Nature lo reivindicó y continúa siendo citado con frecuencia hasta el día de hoy.

Desde entonces, sin embargo, hubo relativamente pocos análisis independientes de la calidad de los artículos de Wikipedia, esto a pesar del enorme crecimiento del proyecto. Wikipedia cuenta hoy con más de 23 millones de artículos en todos los idiomas (más de cuatro millones sólo en inglés) frente a los 3,7 millones de artículos en total que tenía en 2005. Hoy es el sexto sitio con mayor tráfico general según las estadísticas de Alexa, cuando en 2005 ocupaba el puesto 37. ¿Cómo evolucionó la calidad con este incremento de alcance y tamaño? ¿Cómo se compara hoy la calidad de los artículos de Wikipedia con otras enciclopedias en línea? ¿Qué métodos son apropiados para medir la calidad de un artículo enciclopédico?

La Fundación Wikimedia anuncia el lanzamiento de un estudio piloto realizado por Epic, una consultora de enseñanza en línea, en colaboración con la Universidad de Oxford: “Assessing the Accuracy and Quality of Wikipedia Entries Compared to Popular Online Alternative Encyclopaedias: A Preliminary Comparative Study Across Disciplines in English, Spanish and Arabic” (“Evaluación de la exactitud y calidad de las entradas de Wikipedia en comparación con otras conocidas enciclopedias alternativas en línea: un estudio preliminar comparativo interdisciplinario en inglés, español y árabe”).

El estudio compara una muestra de artículos de Wikipedia en inglés con sus equivalentes en la Enciclopedia Británica, Wikipedia en español con Enciclonet, y Wikipedia en árabe con Mawsoah y la Enciclopedia Árabe. 22 artículos de cada una de estas obras fueron presentados a dos o tres expertos académicos hablantes nativos de estas lenguas, quienes las evaluaron en términos cuantitativos y cualitativos.

Lo pequeño de la muestra nos impide generalizar los resultados a toda Wikipedia. Sin embargo, desde lo metodológico, el estudio ofrece nuevas líneas para el diseño de un protocolo que permita la revisión por expertos de contenido enciclopédico. También brinda a nuestra comunidad de editores y a la Fundación, que encargó el estudio en 2011, información para respaldar el diseño de mecanismos de control y medición de calidad que pueden ser usados en la propia Wikipedia.

Los resultados sugieren que los artículos de Wikipedia muestreados tienen en general un puntaje superior a sus contrapartes en los tres idiomas evaluados, con un desempeño especialmente bueno en cuanto a exactitud y provisión de referencias. Según destaca el informe Wikipedia en inglés se compara positivamente frente a la Enciclopedia Británica en términos de exactitud, referencia y juicio general, con una pequeña diferencia de puntaje entre ambas en estilo y calidad general. Los resultados de la comparación entre Wikipedia en español y Enciclonet fueron similares. En árabe, los artículos de Mawsoah y la Enciclopedia Árabe superaron a Wikipedia en cuanto a estilo, pero no se encontraron diferencias significativas en exactitud, referencias, juicio ni calidad general. Los expertos no consideraron que ninguna de las enciclopedias evaluadas fuera superior a las demás en cuanto a la posibilidad de cita en publicaciones académicas.

Esperamos que los resultados del estudio incentiven posteriores investigaciones independientes sobre la calidad de los artículos de Wikipedia. Para contribuir a ese fin Epic y la Universidad de Oxford publican la versión completa del informe con licencia Creative Commons Atribución-CompartirIgual. Con licencia Creative Commons Zero se publica también una colección de información anónima generada por el estudio. El equipo de trabajo espera comentarios y retroalimentación en la página de discusión del proyecto.

Estamos muy motivados por los resultados de esta pequeña muestra de artículos de Wikipedia en tres idiomas. Aún cuando abren un camino para la investigación futura, estos resultados confirman la calidad del trabajo colaborativo de nuestra comunidad de editores.

Dario Taraborelli, analista de investigación senior

 

بعد سبع سنوات من دراسة مجلة نيتشر، دراسة جديدة تقارن محتويات ويكيبيديا بموسوعات أخرى بثلاث لغات

إن تطوير جودة المحتويات هو أحد الأهداف الرئيسية للمساهمين في ويكيبيديا، وأحد أهداف الخطة الخمسية الاستراتيجية لحركة ويكيبميديا بين الأعوام ٢٠١٠-٢٠١٥، إلا أن قياس تلك الأهداف بشكل موضوعي كان ولازال أحد التحديات القائمة. في عام ٢٠٠٥ قامت مجلت نيتشر بنشر مقالة عرضت بأن مقالات ويكيبيديا احتوت ٤ أخطاء بالمعدل في مقابل ٣ أخطاء في مقالات موسوعة بريتانيكا على الإنترنت. اعترضت بريتانيكا على التقرير إلا أن مجلة نيتشر أصرت عليه، ولا زال التقرير واسع الانتشار حتى اليوم.

منذ ذلك الحين ظهر فقط بعض الدراسات التحليلية عن جودة محتويات ويكيبيديا، على الرغم من التوسع الكبير للمشروع. يبلغ تعداد مقالات ويكيبيديا اليوم ما يزيد على ٢٣ مليون مقالة عبر اللغات المتعددة (أكثر من ٤ مليون مقالة منها باللغة الإنكليزية وحدها) بالمقارنة مع ٣.٧ مقالة بالمجموع في عام ٢٠٠٥، تحتل ويكيبيديا اليوم وفقا لترتيب موقع ألكسا المركز السادس من حيث عدد الزيارات، بينما كان ترتيبها ٣٧ في عام ٢٠٠٥.

ومع الزيادة في الحجم والانتشار، فيكون التساؤل المطروح عن تغير جودة المحتويات؟ كيف من الممكن مقارنة ويكيبيديا اليوم بالموسوعات الأخرى المنتشرة على الإنترنت من حيث الجودة؟ وما هي الطرق المثلى لقياس جودة المقالات الموسوعية؟

وهنا تعلن مؤسسة ويكيميديا عن إطلاق دراسة أولية تم القيام بها من قبل مجموعة إيبك الاستشارية بالمشاركة مع جامعة أوكسفورد تحت عنوان “تقييم دقة وجودة مقالات ويكيبيديا بالمقارنة مع موسوعات الإنترنت المنتشرة الأخرى : دراسة مقارنة أولية باللغات الإنكليزية والإسبانية والعربية”

قامت الدراسة بمقارنة نماذج من ويكيبيديا الإنكليزية مع مقالات مقابلة من موسوعة بريتانيكا، وويكيبيديا الإسبانية بمقالات مقابلة من موسوعة إينسيسلونيت، وويكيبيديا العربية مع الموسوعة العربية العالمية، والموسوعة العربية، حيث تم تقييم عينة من ٢٢ مقالة من قبل ٢ – ٣ متحدثين أصليين باللغات الثلاث من المجتمع الأكاديمي وذلك من حيث الكم والجودة.

إن حجم العينة الصغير لا يسمح بتعميم النتائج على ويكيبيديا عموما. إلا أن الدراسة الأولية ركزت بشكل رئيسي على الطريقة، كما أن الدراسة طرحت تصميم جديد لتقييم الخبراء للمحتويات الموسوعية. كما أن الدراسة تقدم لمجتمع ويكيبيديا ولمؤسسة ويكيميديا التي مولت الدراسة في عام ٢٠١١ دليلا يساعد على تصميم آليات لتقييم الجودة ووضع معايير لها لاستخدامها على ويكيبيديا نفسها.

تلخص الدراسة إلى أن مقالات ويكيبيديا سجلت علامات أعلى بشكل عام في كل من اللغات الثلاث، وتميزت بشكل خاص في فئتي الدقة والمراجع المستخدمة. وكما يشير التقيري إلى ويكيبيديا الإنكليزية حققت علامات جيدة مقابل موسوعة بريتانيكا من ناحية الدقة واستخدام المراجع والتقييم العام مع فروقات صغيرة بين من حيث التنسيق وعلامة الجودة الكلية. كما أن نتائج مماثلة تم الوصول إليها عند مقارنة ويكيبيديا الإسبانية مع موسوعة إينسيسلونيت. وفي اللغة العربية فقد حققت الموسوعة العربية العالمية والموسوعة العربية علامات أعلى من ويكيبيديا من حيث التنسيق، لكن لم يكن هناك أي فروقات من حيث الدقة، استخدام المراجع، أو التقييم الكلي للجودة. ولم تحصل أي من الموسوعات في هذه الدراسة على علامة عالية من حيث قابليتها للاستخدام كمرجع في الأبحاث الأكاديمية.

نحن نأمل بأن نتائج هذه الدراسة ستشجع أبحاث مستقلة أخرى حول مواضيع تقييم جودة مقالات ويكيبيديا. إن إيبك وجامعة أوكسفورد تنشران النسخة الكاملة من التقرير تحت رخصة المشاع الإبداعي. كما تم نشر نسخة ببيانات مجهولة الأسماء تم توليدها من قبل هذه القائمة تحت رخصة المشاع الإبداعي صفر. إن فريق الدراسة يرحب بالملاحظات والتقييم على صفحة نقاش المشروع.

لقد شجعتنا هذه النتائج عن مقالات ويكيبيديا بلغات ثلاث بشكل كبير. وبالإضافة إلى أنها تسهل الطريق إلى أبحاث مستقبلية أخرى، فإن هذه النتائج تؤكد على جودة العمل المشترك لمحرري مجتمع ويكيبيديا.

Dario Taraborelli, Senior Research Analyst

16 Responses to “Seven years after Nature, pilot study compares Wikipedia favorably to other encyclopedias in three languages”

  1. M-E Duban says:

    This report, and in general all reporting on these comparisons, suffer in my view from having ignored the obvious discrepancies in validity expected from a study based on sampling, versus ones designed based on the actual manner in which Wikipedia is used. Any sample-basd approach, random or otherwise, ignores the fact that users do not come to an Encyclopedia and evaluate a random selection of content, rather, they come for information on one or a few subjects, often times embedded within larger articles, and so require uniform source quality and uniform “intra-source” navigation; they come for expertise on a very specific subject of interest, as located in relevant articles accessed by search and via relationships communicated by internal links. Hence it matters little how Wikipedia performs in a sampling study, much more how it would perform in a systematic, thorough comparison of “critical content” (however this might be defined in a research content), and then on how flawless the interconnections are that lead the reader between sections of related content on the subject of interest. Were Wikipedia and EB content to be assessed, taking into account how the encyclopedias are actually approached, WP would fare far less well, I argue, than the foregoing reporting suggests. A simple example will illustrate the point. Readers with access should compare the section(s) on the “total and partial synthesis of steroids”, in EB and in Wikipedia. This subject is clearly within the scope of each encyclopedia, but clearly not among the small sample used in the earlier evaluations/comparisons of the sites. Had it been, the differences in content quality would not have come down to single digit statistical differences, for the EB site’s material is present and well developed, while at the WP site is essentially nonexistent. With regard to this, and many, many further real examples, the EB vs WP comparisons literally miss the forest for the trees, because the sample size of the earlier is far too small to give the study the power it needs to draw far-reaching, general conclusions. Now, while there would be some need for independent assessment on what constitutes such “critical content” for comparison, there is no gainsaying that (i) it can be defined, (ii) however is would be defined, in unbiased fashion, the set size would be far larger than the numbers in the samples of the foregoing comparison studies, and (iii) as well, though possibly a more controversial a conclusion, that vis-a-vis uniformity and consistency of high quality, the EB product would far, far surpass that of Wikipedia (for the large sample). (This can be argued based on the example provided, considering a high limit of sample size: if we were to define a parent set as being the union of all articles contained by both encyclopedias, and focus on the subset of articles where both encyclopedias provide coverage, the largest “sample” possible is all articles in this intersecting subset. The gross quality difference between the two articles in the foregoing example would not have been missed in this extremely large “sample”. It is a matter for a future research study design, to determine how large one would need to make the sample (how much smaller than all articles in common to the two encyclopedias), in order to not miss the inaccuracies represented by the steroid subsection example.) Finally, I would note my opinion that our acknowledging and broadcasting such shortsighted and self-serving approbations of the WP as are represented by the comparative studies to date, rather than soliciting accurate, improvement-inducing and broadly valid analyses of our content—this will not move us forward in content quality, and is no feather in our collective intellectual cap. (so says a semi-retired professor)

  2. Bob, I can see how you might think that about Wikipedia, but I don’t think what you’re seeing are actually coverage *gaps* but rather the *much deeper coverage* that Wikipedia gives to certain topics than Britannica. The biggest difference between the Wikipedias and encyclopedias like Britannica — which didn’t show up in this study because it was restricted to areas where “articles from different online encyclopaedias were of comparable substance and focus” — is they are much, much, much larger than things like Britannica.

    Wikipedias are much, much larger, and much more comprehensive. As your article evocatively points out, Britannica in print form occupies about two bookshelves, in about 32 volumes. A hypothetical printed version of the English Wikipedia was estimated at about 1700 volumes in 2010: http://en.wikipedia.org/wiki/Wikipedia:Size_in_volumes

    So if we’re talking about gaps in coverage, Britannica unavoidably suffers enormous gaps in coverage relative to English Wikipedia, simply because English Wikipedia contains some 50 pages of material for every page in Britannica. As an example, the first example that came to my mind, compare http://www.britannica.com/EBchecked/topic/588941/tetrachloroethylene (126 words, of which 100 are available to anyone) with https://en.wikipedia.org/wiki/Tetrachloroethylene (900 words plus 13 references, two diagrams, and a table of chemical property data). They’re not even in the same league. The fact that the Britannica article appears to have not been updated in several decades, and so omits the extremely important information that tetrachloroethylene is a probable carcinogen, is comparatively insignificant!

    Now of course we can’t expect English Wikipedia to cover chemical compounds to the same depth that it covers, say, Pokémon: https://en.wikipedia.org/wiki/Zoroark — I think that’s what you’re saying about “the typically parochial interests of its contributors”. But it still covers chemical compounds, and everything else I’ve compared, in dramatically greater depth than Britannica does.

  3. NaBUru38 says:

    I’ve read some fifty pages of the study. One of the alleged weaknesses of Wikipedia, lack of definition of terms, has an easy explanation: Wikipedia relies on users clicking links to get more information. For example, the article Ecology doesn’t really need to define organism, environment, biomass, ecosystem, species, community and biodiversity, because their definitions are a click away. The study treats articles as standalone works, but Wikipedia doesn’t work that way.

  4. NaBUru38 says:

    “Enciclonet was selected because of its high popularity, its high Alexa traffic rank of 322,628″ – That number is high indeed, but in a bad sense.

  5. Pete Forsyth says:

    And, I’m finding errors — nice case in point why using a wiki for things like this can be beneficial. (The first entry in the Table of Contents has the wrong page number; and the first citation in the paper appears to be absent.)

  6. Pete Forsyth says:

    Thanks for this excellent news.

    I have begun transcribing this freely licensed publication on Wikisource:

    http://en.wikisource.org/wiki/Index:EPIC_Oxford_report.pdf

    …and would welcome any help! The goal is to make a more accessible HTML version of the study, to supplement the PDF. If anybody would like some pointers on how to get started with this, contact me at http://en.wikisource.org/wiki/User_talk:Peteforsyth

  7. Filceolaire says:

    Bob

    “First, coverage of topics within a broad subject area is determined by self-selecting submitters, so there are often coverage gaps” Citation needed Bob. My impression is that coverage gaps on English Wikipedia have been filled over time.

    “Second, there are often inconsistencies from one article to the next within a broad subject area. Although individual articles may be accurate with respect to their topic, one cannot rely on the collection of articles for a broad subject area as consistent or comprehensive.” Citation needed again Bob. Do you have evidence for this or is it just your impression? My impression is that when the editors finish on one topic in a broad subject area they move on to work on related topics and by now most of those articles in English Wikipedia have been improved.

    “An actively edited reference like EB is much less likely to be deficient this way.” I think you will find that wikipedia is pretty actively edited Bob. Maybe you meant to call EB “actively managed”. I agree that EB, with it’s more limited resources, needs to choose carefully where to apply those resources.

    I think your comments above do still apply on some of the other language wikipedias – Welsh wikipedia still needs work though it is already the largest Welsh language site on the web.

  8. Bob Binder says:

    This study ignores two significant deficiencies of Wikipedia as it compares with references like Encyclopedia Britannica. First, coverage of topics within a broad subject area is determined by self-selecting submitters, so there are often coverage gaps. Second, there are often inconsistencies from one article to the next within a broad subject area. Although individual articles may be accurate with respect to their topic, one cannot rely on the collection of articles for a broad subject area as consistent or comprehensive. An actively edited reference like EB is much less likely to be deficient this way.

    More about this at
    http://www.robertvbinder.com/britannica-brat/

    • Tilman Bayer says:

      Thank you for the link. Opinions of the form “Encyclopedia X is produced by method Y, therefore its content must exhibit property Z” are very frequent in discussions comparing Wikipedia with other encyclopedias. I think one of the values of the present study is that it adheres to the established principle of blind review, and developed methods (described in section 3.4.2) to ensure that reviewers’ judgments are not biased by such preconceived opinions about each encyclopedia’s production process.

      I agree that besides factual accuracy, comprehensiveness/evenness of coverage (“not what was there, but what was not”, as you wrote in your article) is important, and I’d love to see objective methodology developed to compare it across an encyclopedia’s entire coverage of a particular subject area. But it’s worth noting that in their judgment of single articles, reviewers in this study already took this into account, see e.g. this example on p.43:

      ‘all three reviewers for the article on Memory felt that, despite “minor flaws” (Reviewer 2), the Wikipedia article was superior especially with respect to its coverage of the topic: “The first article [Wikipedia] is decent. It is reasonably concise, and covers most things that I would include – certainly it is not perfect, and there are things missing, but it is concise and well-written. By contrast the second article [Britannica] is very vague and makes minimal links to the actual original science behind the points […] I actually think that it would be a little misleading to a novice, because the literature has developed so much in the last 10-15 years.” (Reviewer 3 – doctoral student)’