Wikimedia Research Newsletter
Wikimedia Research Newsletter Logo.png

Vol: 5 • Issue: 5 • May 2015 [contribute] [archives] Syndicate the Wikimedia Research Newsletter feed

Drug articles accurate and largely complete; women “slightly overrepresented”; talking like an admin

With contributions by: William Skaggs, Max Klein, Piotr Konieczny, Gamaliel, Jonathan Morgan and Tilman Bayer

German study finds Wikipedia’s pharma articles accurate and largely complete

Review by William Skaggs

Recently when my 83-year-old father was undergoing medical treatment, the doctor wanted to change one of his blood pressure drugs, and in order to let us know what the effects would be, she printed out the Wikipedia article on the drug and handed it to us. This accords with the overall impression I have developed: Wikipedia’s articles on drugs are pretty good – good enough to impress even doctors. A new research study[1] adds some substance to that impression.

A team of German pharmacologists picked a set of 100 drugs described in pharmacology textbooks, and compared the textbook descriptions with Wikipedia articles about the drugs, for accuracy (meaning that the Wikipedia article matched the information in the textbook) and comprehensiveness. They found that 99.7% of the facts in the Wikipedia articles were accurate, and 83.8% of the facts from the textbooks made it into the Wikipedia articles. These numbers were derived from the German Wikipedia, but the authors state that similar results were obtained for the English language version. They conclude that “our results suggest that Wikipedia is an accurate and informative source of drug information for undergraduate medical students.” They also revisited the drug articles examined in 2010 by an earlier study which came to less positive conclusions (see coverage in this newsletter: “Quality of drug information in Wikipedia“), and “found the quality of pharmacological information significantly improved”. Upon reviewing several other empirical studies which evaluated the quality of medical information on Wikipedia, the authors observe that “despite different methodologies, the main conclusion of these studies was that Wikipedia articles on health topics contain few errors and are well referenced, while the information provided often lacks depth.”

Obviously this is something we should be proud of, but let me note a caveat. Articles about specific drugs are a prime example of the sort of thing Wikipedia is best at: articles about topics that can be handled in a systematic way, without requiring mastery of a large body of literature. As a rule, the more comprehensive a topic, the lower the quality of the Wikpedia article. Thus our article on the drug chlordiazepoxide (commonly known as Librium) is better than our benzodiazepine article, which covers the class of drugs to which Librium belongs. The latter article contains a lot of good information but is poorly organized. Our article pharmaceutical drug shows this flaw to an even greater degree. The general take-home message, supported by the German study, is that our medical articles can be very useful to people who are looking for specific facts, but tend to be less useful to people who are trying to understand broad principles.

Notable women “slightly overrepresented” (not underrepresented) on Wikipedia, but the Smurfette principle still holds

Review by Maximilianklein

“It’s a man’s Wikipedia? Assessing gender inequality in an online encyclopedia”,[2] presented at the Ninth International AAAI Conference on Web and Social Media (ICWSM) this week, is an investigation into the gender of biography articles of six different Wikipedias. Four different biases that are investigated are coverage bias (who makes it into the encyclopedia), structural bias (which articles link to which), lexical bias (the type of words used in the articles), and visibility bias (who is featured on the Main Page).

Coverage bias is analysed by seeing who from the reference databases of notable humans of Freebase, MIT’s Pantheon, and Human Accomplishment are in Wikipedia. A surprising result here is that women are not proportionally underrepresented as hypothesised, but even “slightly overrepresented”. (The researchers acknowledge that the first two of these three are at least partly based on Wikipedia themselves, but try to address this issue by “seeking patterns that exist across all three datasets”.)

The structural bias is a graph theoretical measure of how men and women’s articles link to each other. Here it is shown that across all six languages, articles about women tend to link more to articles about men than vice versa. The Smurfette Principle, that women are less central in the link graph, is also tested. The in-degree of the two gendered article categories is compared, and it is found that men are indeed significantly more central in all language editions, except in the Spanish Wikipedia, where men and women are equally central.

The lexical bias notion stems from the idea of the Finkbeiner test, that a female scientist will often be noted as a woman as much as a scientist. It is indeed found that articles about women place linguistic emphasis on relationship, gender, and family. Whereas top terms in men’s articles focus on their professions. The authors mention that this ties into the concept of male as the null gender. For instance the word “divorced” is 4.4 times more frequent in a woman’s article than a man’s on English Wikipedia. For German and Russian, that multiplier increases to 4.7 and 4.8 times, respectively.

Lastly visibility bias, the propensity of gendered articles to appear on the English Wikipedia Main Page is tested. Yet no significant difference is found in the propensity of the two genders to appear on the Main Page.

Unfortunately this paper suffers from its Euro-focus. The six languages in question are English, German, French, Italian, Spanish and Russian, but the width of the methods used still show wide-scale issues. The authors conclude that Wikipedia does show some signs of addressing systemic bias, like equal visibility on the main page, and coverage bias equality; but still there are stark differences in their portrayal. Whether this is due to biases in the real world, or the way that Wikipedians write about the real world, they say, is still an unknown mixed bag.

Editors who use user talk pages are more involved in high-quality articles

Review by Piotr Konieczny

An article[3] in the Journal of the Association for Information Science and Technology (JASIST) examines Wikipedia editors’ public communication using social network analysis theory. This research suggests that Wikipedia editors who engage in communication with others using user talk pages “are more experienced in editing high quality articles and are more integrated in the community”. The author distinguishes quantitative and qualitative contributions, noting that the use of communication tools is more directly related to contributing not just to many articles, but to high quality articles, as well as larger number of namespaces. The use of such tools is centered on “coordinating and mentoring editors who edit lower quality articles”, or in other words, the author observes that editors who edit high quality articles and use communication tools a lot seem to be more likely to reach out to less experienced editors than the other way around. The author concludes that online collaboration systems are improved through features that allow creation of what the author calls “personal” communication network. Through the study excluded bots, it does not seem to have investigated the details of communication (ex. templates, warnings, awards, others), and so its conclusions on the nature of communications (rather than who engages in it) are more tentative.

“Wikipedia, collective memory, and the Vietnam war”

Should the article Vietnam War open with this lead image (because “it’s one of only two photos of [a member of the US military] winning the Medal of Honor”), or instead with a depiction of the My Lai massacre? One of the many debates from the article’s talk page (the current version uses a collage of several images)

Review by Piotr Konieczny

This paper,[4] likewise published in the JASIST, looks at the Talk:Vietnam War page (and its archives) and analyses it in the context of theories dealing with the concept of collective memory (cultural memory, memory space, and the “floating gap” concept introduced by Pentzold (2009) in his paper on Wikipedia.[supp 1] As such, this paper is one of several works that argues that Wikipedia is a place where modern world’s memories are being recorded and, to some extent, shaped for posterity. The paper finds that the Wikipedia’s article is affected by two major debates (“(a) whether the US actually lost the war and (b) whether the voice of the American Vietnam veteran should be privileged.”) It reviews major, recurring arguments presented by the talk page participants, and concludes that Wikipedia allows us to study how collective memory is shaped. The author also argues that it is the very fact that such debates can be observed on Wikipedia that may distance some educators, primarily librarians, who are used to works that conceal their knowledge production processes. The author ends with a call for librarians to edit Wikipedia, and help their patrons do the same, in order to participate in the 21st century curation of collective memories.

In a separate paper, published earlier in the Journal of Documentation,[5] the author examined the debate about reliable sources on the same talk page and concluded (according to the abstract) that while much of it “is conducted without acrimony, the level of analysis one finds in the talk pages is rather shallow while the attention of individual contributors is not overly concentrated.”

Survey of secondary school use of Wikipedia

Review by Gamaliel

Three researchers have conducted a survey[6] of the use and perceptions of Wikipedia among secondary school teachers and librarians in the United States. Twenty-two teachers and librarians responded to the survey. The vast majority (91%) reported that “Wikipedia had some effect on student research”. Responses were mixed about how positive or negative that effect was, however. Positive comments included responses that Wikipedia is “easily understood…thorough, up-to-date, and easily edited” and “students use it to get the basic ideas for their research, then go to other websites to verify it.” Negative comments largely centered on the fact that many students did not go beyond Wikipedia in their research, such as the responses that “students rely on it too heavily and do not expand their research to prove or disprove their findings” and “Students don’t want to check sources when they can just get their work done in one stop.” Most (91%) reported that their schools had no policy regarding the use of Wikipedia, but responses were roughly split regarding the need for one. Teachers and those responding that Wikipedia had a negative effect were more likely to respond there was a need for such a policy, as opposed to librarians and those responding it had a positive effect. Based on the results, the authors concluded that any policy should not restrict Wikipedia use. They write “instead of banning and fighting against the usage, students need to be taught the skills to utilize it an effective way, such as how to use Wikipedia as a jumping off point to other potentially more trustworthy resources and how to evaluate the reliability of articles.” Given the very small sample size of the survey, this article is more useful for its excellent literature review.


“User engagement on Wikipedia, a review of studies of readers and editors”

Another ICWSM conference paper[7] frames itself as a literature review of topics that are of key interest to Wikipedia community: editor motivations, engagement, and retention. Unfortunately, it lacks a proper methodology (how did the author select papers to review?), which makes it difficult to discuss how its comprehensiveness. It nonetheless provides a good summary of many other key work in this field, and creates an interesting framework for recognizing some patterns in this subfield of Wikipedia studies. Unsurprisingly, the authors conclude that the Wikipedia community needs to improve its communication with newbies in order to increase their retention (fewer templates, stark warnings; more friendly personal outreach). (Review by Piotr Konieczny)

A large metallic sculpture of a red rose on a small grassy mound, with bare trees and other similar sculptures in the background

An image of sculptures in Berlin, published under the freedom of panorama provisions in German copyright law

Freedom of panorama in Europe

This paper[8] presents an advocacy towards adopting freedom of panorama laws in the context of the European Union law harmonization. It is enriched with case studies from Wikipedia community’s history, and has been supported by the Wikimedia Foundation (though the paper does not make it clear how, nor is it released under a free license itself). While suffering from a few minor issues (such as not clearly recognizing that Wikimedia Commons does not accept non-commercial images, and a law that would grant freedom of panorama to non-commercial uses would be of little value to Wikipedia), and heavily geared towards European legislation framework, it is a valuable addition to the discussion of the freedom of panorama concept. (Review by Piotr Konieczny)

Talking like an admin: linguistic mimicry and network centrality on Wikipedia

A new conference paper[9] in the field of sociolinguistics examines whether Wikipedia editors are more likely to linguistically coordinate with (use the same words as) their interlocutors when those others are more centrally located within the social network of Wikipedia, or when speaking to admins. The study draws on an annotated corpus of talkpage discussions[supp 2] in which the admin status of each participant is known, and uses several measures of network centrality (Betweenness and Eigenvector) to calculate the distance between all editors in terms of the number of times they have directly replied to others in a talkpage thread. The authors determine that while editors align their vocabularies more when speaking to admins than non-admins, highly central editors (those who have engaged in a lot of discussions with a lot of different editors) tend to be aligned with whether or not they are admins. Their results suggest that admin status follows high centrality, not the other way around.

Other recent publications

A list of other recent publications that could not be covered in time for this issue – contributions are always welcome for reviewing or summarizing newly published research.

  • From March’s CSCW conference (see also Research:CSCW 2015):
    • “Functional roles and career paths in Wikipedia”[10]
    • “‘Is’ to ‘was’: coordination and commemoration in posthumous activity on Wikipedia biographies”[11]
    • “The virtuous circle of Wikipedia: recursive measures of collaboration structures”[12]
    • “Effects of a Wikipedia orientation game on new user edits”[13] (about The Wikipedia Adventure)
  • “Wikipedia and the politics of openness”[14] (book, see also 2011 Signpost interview with the author)
  • “Wikipédia, objet scientifique non identifié”[15] (“Wikipedia, unidentified scientific object”, book in French)
  • “Improving disease surveillance: sentinel surveillance network design and novel uses of Wikipedia”[16]
  • “Disaster monitoring with Wikipedia and online social networking sites: structured data and linked data fragments to the rescue?”[17]
  • “Barriers to the localness of volunteered geographic information”[18]
  • “Amateur encyclopedia editors as nonprofessional journalists: Wikipedia as a gateway for breaking news”[19] (German, with extended abstract in English)
  • “How to extract seasonal features of sightseeing spots from Twitter and Wikipedia”[20]
  • “Analysing the use and perception of Wikipedia in the professional context of translation”[21]
  • “Cross-language Wikipedia editing of Okinawa, Japan”[22]
  • “Property type distribution in Wordnet, corpora and Wikipedia”[23]
  • “Quality assessment of Wikipedia articles using h-index”[24] From the abstract: “In this paper, we propose a method for assessing quality values of Wikipedia articles from edit history using h-index. One of the major methods for assessing Wikipedia article quality is a peer-review based method. In this method, we assume that if an editor’s texts are left by the other editors, the texts are approved by the editors, then the editor is decided as a good editor [ see Research:Content persistence ]. However, if an editor edits multiple articles, and the editor is approved at a small number of articles, the quality value of the editor deeply depends on the quality of the texts. In this paper, we apply h-index [… to improve this method. …] the accuracy of article quality assessment in our method outperforms the existing peer-review based method.”
  • “Social Interactions vs Revisions, What is important for Promotion in Wikipedia?”[25] From the abstract: “[We look] at the process of election for administrator in the English Wikipedia community. We modeled the candidates according to their revisions and/or social attributes. […] Our model combining knowledge contribution variables and social networking variables successfully explain 78% of the results which is better than the former models. It also helps to refine the criterion for election. If the number of knowledge contributions is the most important element, social interactions come close second to explain the election. But being connected with the future peers (the admins) can make the difference between success and failure, making this epistemic community a very social community too.”


  1. Kräenbring, Jona (2014). “Accuracy and completeness of drug information in Wikipedia: a comparison with standard textbooks of pharmacology”. PLoS One 9 (9): e106930. doi:10.1371/journal.pone.0106930. PMID 25250889.  Open access
  2. Wagner, Claudia; Garcia, David; Jadidi, Mohsen; Strohmaier, Markus (2015-04-21). “It’s a Man’s Wikipedia? Assessing Gender Inequality in an Online Encyclopedia”. Ninth International AAAI Conference on Web and Social Media. Ninth International AAAI Conference on Web and Social Media. 
  3. Tsikerdekis, Michail (2015-06-01). “Personal communication networks and their positive effects on online collaboration and outcome quality on Wikipedia“. Journal of the Association for Information Science and Technology. doi:10.1002/asi.23429. ISSN 2330-1643.  Closed access
  4. Luyt, Brendan (2015-06-01). “Wikipedia, collective memory, and the Vietnam war“. Journal of the Association for Information Science and Technology. doi:10.1002/asi.23518. ISSN 2330-1643.  Closed access
  5. Brendan Luyt (2015-03-25). “Debating reliable sources: writing the history of the Vietnam War on Wikipedia“. Journal of Documentation. doi:10.1108/JD-11-2013-0147. ISSN 0022-0418.  Closed access
  6. (2015-04-23) “Wikipedia Use in Research: Perceptions in Secondary Schools“. TechTrends 59 (3): 92–102. doi:10.1007/s11528-015-0858-6. ISSN 8756-3894. 
  7. Miquel-Ribé, Marc (2015-04-22). “User Engagement on Wikipedia, A Review of Studies of Readers and Editors”. Ninth International AAAI Conference on Web and Social Media. Ninth International AAAI Conference on Web and Social Media. 
  8. Lobert, Joshua; Isaias, Bianca; Bernardi, Karel; Mazziotti, Giuseppe; Alemanno, Alberto; Khadar, Lamin (2015-04-25). “The EU Public Interest Clinic and Wikimedia Present: Extending Freedom of Panorama in Europe”. Rochester, NY: Social Science Research Network. 
  9. Noble, Bill; Fernandez, Raquel (2015-06-04). “Centre Stage : How Social Network Position Shapes Linguistic Coordination”. 2015 Workshop on Cognitive Modeling and Computational Linguistics: Social Science Research Network. 
  10. Arazy, Ofer; Ortega, Felipe; Nov, Oded; Yeo, Lisa; Balila, Adam (2015). “Functional Roles and Career Paths in Wikipedia”. Proceedings of the 18th ACM Conference on Computer Supported Cooperative Work & Social Computing. CSCW ’15. New York, NY, USA: ACM. pp. 1092-1105. DOI:10.1145/2675133.2675257. ISBN 978-1-4503-2922-4.  Closed access / author copy 1, author copy 2
  11. Keegan, Brian C.; Brubaker, Jed R. (2015). “‘Is’ to ‘Was’: Coordination and Commemoration in Posthumous Activity on Wikipedia Biographies”. Proceedings of the 18th ACM Conference on Computer Supported Cooperative Work & Social Computing. CSCW ’15. New York, NY, USA: ACM. pp. 533-546. DOI:10.1145/2675133.2675238. ISBN 978-1-4503-2922-4.  Closed access / author copy
  12. Klein, Maximilian; Maillart, Thomas; Chuang, John (2015). “The Virtuous Circle of Wikipedia: Recursive Measures of Collaboration Structures”. Proceedings of the 18th ACM Conference on Computer Supported Cooperative Work & Social Computing. CSCW ’15. New York, NY, USA: ACM. pp. 1106-1115. DOI:10.1145/2675133.2675286. ISBN 978-1-4503-2922-4.  Closed access
  13. Narayan, Sneha; Orlowitz, Jake; Morgan, Jonathan T.; Shaw, Aaron (2015). “Effects of a Wikipedia Orientation Game on New User Edits”. Proceedings of the 18th ACM Conference Companion on Computer Supported Cooperative Work & Social Computing. CSCW’15 Companion. New York, NY, USA: ACM. pp. 263-266. DOI:10.1145/2685553.2699022. ISBN 978-1-4503-2946-0.  Closed access
  14. Tkacz, Nathaniel (2014-12-19). Wikipedia and the Politics of Openness. Chicago ; London: University Of Chicago Press. ISBN 9780226192277. 
  15. Barbe, Lionel; Merzeau, Louise; Schafer, Valérie (2015-04-13). Wikipédia, objet scientifique non identifié. Presses Universit. Paris 10. ISBN 9782840169208. 
  16. Geoffrey Colin Fairchild: Improving disease surveillance: sentinel surveillance network design and novel uses of Wikipedia. PhD thesis, CS, University of Iowa, December 2014 pdf
  17. Steiner, Thomas; Ruben Verborgh (2015-01-26). “Disaster monitoring with Wikipedia and online social networking sites: structured data and linked data fragments to the rescue?“. arXiv:1501.06329 [cs]. 
  18. Sen, S. W., Ford, H., Musicant, D. R., Graham, M., Keyes, O. S. B., Hecht, B. 2015 Barriers to the Localness of Volunteered Geographic Information. CHI 2015 PDF
  19. Thomas Roessing: Enzyklopädie-Amateure als Amateur-Journalisten: Wikipedia als Gateway für aktuelle Ereignisse. / Amateur encyclopedia editors as nonprofessional journalists: Wikipedia as a gateway for breaking news HTML, PDF extended abstract in English: PDF. Studies in Communication | Media, No 2 of 2014.
  20. Fang, Guanshen; Sayaka Kamei, Satoshi Fujita (2015-01-31). “How to extract seasonal features of sightseeing spots from Twitter and Wikipedia (Preliminary Version)“. Bulletin of Networking, Computing, Systems, and Software 4 (1): 21–26. ISSN 2186-5140. 
  21. Elisa Alonso: Analysing the use and perception of Wikipedia in the professional context of translation. JoSTrans Issue 23 HTML
  22. Hale, Scott A. (2015-01-04). “Cross-language Wikipedia editing of Okinawa, Japan“. arXiv:1501.00657 [cs]. 
  23. Barbu, Eduard. “Property type distribution in Wordnet, corpora and Wikipedia“. Expert Systems with Applications. doi:10.1016/j.eswa.2014.11.070. ISSN 0957-4174.  Closed access
  24. Suzuki, Yu (2015). “Quality assessment of Wikipedia articles using h-index”. Journal of Information Processing 23 (1): 22-30. doi:10.2197/ipsjjip.23.22. 
  25. Picot-Clémente, Romain; Cécile Bothorel, Nicolas Jullien (2015-01-07). “Social interactions vs revisions, what is important for promotion in Wikipedia?“. arXiv:1501.01526 [cs]. 
Supplementary references and notes:
  1. Pentzold, Christian (2009). “Fixing the floating gap: The online encyclopedia Wikipedia as a global memory place“. Memory Studies 2 (2): 255–272. doi:10.1177/1750698008102055. ISSN 1750-6980. 
  2. Cristian Danescu-Niculescu-Mizil, Lillian Lee, Bo Pang and Jon Kleinberg. Echoes of power: Language effects and power differences in social interaction. Proceedings of WWW, 2012.

Wikimedia Research Newsletter
Vol: 5 • Issue: 5 • May 2015
This newletter is brought to you by the Wikimedia Research Committee and The Signpost
Subscribe: Syndicate the Wikimedia Research Newsletter feed Email WikiResearch on Twitter[archives] [signpost edition] [contribute] [research index]