Most important people; respiratory reliability; academic attitudes

With contributions by: Piotr Konieczny, Anwesh Chatterjee and Tilman Bayer.

Most important people of all times, according to four Wikipedias

Most prominent person on the English, Chinese, Japanese, and German Wikipedia, according to the paper’s PageRank method

This social network analysis[1] looks at the entire corpus of Wikipedia biographies (with data from English, Chinese, Japanese and German Wikipedias). The authors created several thousand networks (unfortunately, this short conference paper does not discuss precisely how) and used the PageRank algorithm to identify key individuals.

The authors attempt to answer the question “Who are the most important people of all times?” Their findings clearly show that different Wikipedias give different prominence to different individuals (the most prominent people, for the four Wikipedias, appear to be George W. Bush, Mao Zedong, Ikuhiko Hata and Adolf Hitler, respectively). The Eastern cultures seem to prioritize warriors and politicians; Western ones include more cultural (including religious) figures. Interesting findings concern globalization: “While the English Wikipedia includes 80% non-English leaders among the top 50, just two non-Chinese made it into the top 50 of the Chinese Wikipedia … Japanese Wikipedia is slightly more balanced, with almost 40 percent non-Japanese leaders”. Findings for the German Wikipedia are not presented. Though the authors don’t make that point, it seems that no women appear in the Top 10 lists presented. Overall, this seems like an interesting paper (it also received a writeup in Technology Review), through the brief form (two pages) means that many questions about methodology remain unanswered, and the presentation of findings, and analysis, are very curt. On a side note, one can wonder whether this paper is truly related to anthropology; given that the only time this field is referred to in this work is when the authors mention that they are “replacing anthropological fieldwork with statistical analysis of the treatment given by native speakers of a culture to different subjects in Wikipedia.”

See also our earlier coverage of similar studies:

“Wikipedia a reliable learning resource for medical students? Evaluating respiratory topics”

A paper in Advances in Physiology Education[2] claims to assess the suitability of Wikipedia’s respiratory articles for medical student learning. Forty Wikipedia articles on respiratory topics were sampled on 27 April 2014. These articles were assessed by three researchers with a modified version of the DISCERN tool. Article references were checked for accuracy and typography. Readability was assessed with the Flesch–Kincaid and Coleman–Liau tools.

The paper found a wide range of accuracy scores using the modified DISCERN tool, from 14.67 for “[Nail] clubbing” to 38.33 for “Tuberculosis”. Incorrect, incomplete or inconsistent formatting of references were commonly found, although these were not quantified in the paper. Readability of the articles was typically at a college level. On the basis of these findings, the paper declares Wikipedia’s respiratory articles as unsuitable for medical students.

The researcher apparently uses an arbitrary unvalidated modification of the DISCERN tool to assess the accuracy of articles. The nature of this modification is not specified; nor is it available at the journal’s website as claimed in the paper.

The DISCERN tool does not assess accuracy; rather, it is designed to assess “information about treatment choices specifically for health consumers”. As such, the use of this tool is inappropriate to assess the suitability for medical students.

There is no acknowledgement that Wikipedia is an encyclopedia. Several of the DISCERN tool’s questions are unsuitable for an encyclopedia. DISCERN questions such as “Does it describe how each treatment works?” and “Does it describe the risks of each treatment?” would be answered on other Wikipedia pages, not on the disease article’s page. The author makes an a priori assumption that the medical textbooks used for comparison are perfect sources. The author does not assess those textbooks with the DISCERN tool.

The paper states: “[t]he number of citations from peer-reviewed journals published in the last 5 yr was only 312 (19%).” However this is far superior to the number of citations in the textbooks listed. The chapter on “Neoplasms of the lung” in Harrison’s Principles of Internal Medicine (18th ed.) contains no citations at all. Seven sources are listed in its “Further readings” section, of which only one is from the last five years.

The claim that the article on “clubbing … had no references or external links” is incorrect. On 27 April 2014, Wikipedia’s article on “Nail clubbing” had ten references.

Several of the articles are at a rudimentary stage, containing limited information and lacking appropriate references. However two articles, “Lung cancer” and “Diffuse panbronchiolitis“, were assessed by Wikipedia’s editors at the highest standard and awarded “Featured article” status. Five more articles, “Asthma“, “Chronic obstructive pulmonary disease“, “Pneumonia“, “Pneumothorax” and “Tuberculosis“, reached “Good article” standard. These articles are exceptionally detailed, accurate, and well-referenced. Azer’s paper makes no mention of the high quality of these articles.

The research uses an unvalidated tool for an inappropriate purpose without applying a suitable comparator, and inevitably draws incorrect conclusions.

Wikipedia is an encyclopedia. It is not a medical textbook; nor is it intended to replace medical textbooks. Rather, it should be used as a starting point by medical students. The quality of an individual article should be quickly assessed by the reader, and information can be confirmed in the references provided. Missing information should be sought from other sources, such as textbooks. Students should be encouraged to use Wikipedia alongside medical textbooks to assist their learning.

Disclosure: I (Axl) am a Wikipedia editor, a pulmonologist, the main author of Wikipedia’s “Lung cancer” article, and a major contributor to other respiratory articles.

Most academics are not concerned about Wikipedia’s quality – but many think their colleagues are

This recent study[3] is a valuable contribution to the small body of work on academics attitudes towards Wikipedia, and is the largest-scale survey in that field so far, with nearly a 1000 valid responses from the faculty at two Spanish universities. The authors find that Wikipedia is generally held in a positive regard (nearly half of the respondents think it is useful for teaching, while less than 20% disagree; similar numbers use it for general information gathering, though the numbers are split at about 35% on whether they use it for research in their own discipline). Almost 10% of the respondents say they use it frequently for teaching purposes. The numbers of those who discourage students from using it and those who encourage student to consult the site are nearly equal, at about a quarter each. Almost half have no strong feelings on this, and fewer than 15% strongly disagree with students’ use of Wikipedia – suggesting that the past few years have witnessed a major shift in universities (less than a decade ago, the stories of professors banning Wikipedia were quite common). Unsurprisingly, the faculty is much less likely to cite Wikipedia, with only about 10% admitting they do so.

Almost 90% of the academics think Wikipedia is easy to use, but only about 15% think editing is easy – with more than 40% disagreeing with that statement. Some 2% of respondents describe themselves as very frequent contributors to the side, and 6% as frequent. More than 40% have no thoughts on Wikipedia’s editing and reviewing system, which leads the authors to suggest that “most faculty do not actually know Wikipedia‘s specific editing system very well nor the way the [site’s] peer-review process works”. Asked about Wikipedia’s quality, those who think its articles are reliable outnumber those who disagree by two to one (40% to 20%), with an even higher ratio (more than three to one) agreeing that Wikipedia articles are up to date. The respondents are equally divided, however, on whether the articles are comprehensive or not. The authors thus conclude that the impression that most academics are concerned about Wikipedia’s quality is not proven by their data. Nonetheless, the artifacts of Wikipedia early poor reception within academia linger: more than half of the respondents think the use of Wikipedia is frowned on by most academics, even though only 14% say they frown on it themselves.

The study goes beyond presenting simple descriptive statistics, giving us a number of interesting findings based on correlations: strongest correlation for teaching use is related to making edits (r=0.59), followed by opinions that it improves student learning (r=0.47), perception of and use by colleagues (r=0.41), Wikipedia’s perceived quality (r=0.4), and its passive use (r=0.3). The researchers find that the use of Wikipedia is higher, and views of the site more favourable, among the STEM fields than in the “soft”, social sciences. This also explains the Wikipedia’s higher popularity among male instructors (which disappears when controlled for discipline and the corresponding much lower population of women teaching in the STEM fields). Interestingly, the influence of age was not found to be significant: “faculty’s decision to use Wikipedia in learning processes does not follow the usual pattern of other Web 2.0 tools where young people tend to be more frequent users.”

Of immediate practical value to the Wikipedia community are the findings on what would help the respondents design educational activities using Wikipedia: 64% would like to see a “catalog presenting best practices”, with similar numbers (~50%) pointing to “getting greater institutional recognition”, “having colleagues explaining their own experiences”, and “receiving specific training”.

Wikipedia assignments at Finnish secondary schools

A conference paper titled “Guiding Students in Collaborative Writing of Wikipedia Articles – How to Get Beyond the Black Box Practice in Information Literacy Instruction”[4] (already briefly mentioned in our October issue) reports on the use of Wikipedia student assignments in a somewhat different environment than the usual American undergraduates: this one instead deals with Finnish secondary school students. The authors use the guided inquiry framework, postulating that “information literacies are best learned by training appropriate information practices in a genuine collaborative process of inquiry”, and asking how collaborative Wikipedia writing assignments fit into this approach. The findings tie in with the previous research on this subject: students are more motivated than in traditional writing assignments, develop skills in and understanding of wikis and Wikipedia (including its reliability) and more broadly encyclopedic writing. However, students are less likely to develop skills such as identifying reliable sources without specific additional instructions. The researchers note that “the limitation of encyclopaedic writing is that it is not intended to generate new knowledge but to synthesize knowledge from existing sources (i.e., a type of literature review)”; hence teachers who aim to develop skills in generating new knowledge might consider alternative assignments. The paper stresses the need to tailor the Wikipedia assignment (or any other) to the specific class.


Detecting the location of an editing controversy within a page

Researchers at Google, AT&T, Purdue University and the University of Trento have developed[5] an algorithm that “in contrast to previous works in controversy detection in Wikipedia that studied the problem at the page level […] considers the individual edits and can accurately identify not only the exact controversial content within a page, but also what the controversy is about and where it is located.” As an example, the paper names the article about Chopin where “our method detected not only the known controversy about his origin but also the controversies about his date of birth and his photograph by Louis-Auguste Bisson.”

7.8% of Germans use Wikipedia on any given day

In a survey[6] by the German state media authorities, 26.8% of all Germans who had been seeking information on Internet on the preceding day had used Wikipedia for that purpose. In absolute terms, this means that 7.8% of Germans use Wikipedia on any given day to obtain information, compared to 11.2% for Facebook, 8.1% for YouTube, and 6.3% for Twitter.
A separate study[7] found that 40% of German teenagers use Wikipedia daily or several times per week (compared to 38% in 2013[supp 1]).

Vandals’ lack of spelling discipline hampers automatic detection of vulgar words

A student project[8] at the University of Maryland, Baltimore County trained a vandalism detector on the well-known PAN 2010 vandalism corpus. The author concludes that compared to features based on the metadata of the revision (e.g. the size change, or whether the edit was made by an IP editors), or on quantiative features of the inserted text (e.g. the frequency of upper case character), “Language Features provide the least information gain. It is expected that language features would provide the maximum information gain. But the problem is if anyone wants to vandalize a page, he or she would not care to spell the words correctly and so in most cases vulgar/slang dictionaries fall short identifying the bad words. “

New Wikimedia open access policy

At the recent CSCW conference (see also an overview of Wikimedia-related events and presentations there), the Wikimedia Foundation announced its new Open Access Policy to ensure that all research work produced with support from the Foundation will be openly available to the public and reusable on Wikipedia and other Wikimedia sites. See also coverage in this week’s Signpost

Other recent publications

