Wikimedia blog

News from inside the Wikimedia Foundation.org

Wikimedia Research Newsletter

Wikimedia Research Newsletter, January 2012

WRN header.png

Vol: 2 • Issue: 1 • January 2012 [archives] Syndicate the Wikimedia Research Newsletter feed

Language analyses examine power structure and political slant; Wikipedia compared to commercial databases

With contributions by: Tbayer and Piotrus

Contents

Admins influence the language of non-admins

An Arxiv preprint titled “Echoes of power: Language effects and power differences in social interaction”[1] looks at the language used by Wikipedia editors. The authors look at how conversational language can be used to understand power relationships. The research analyzes how much one adapts their language to the language of others involved in a discussion (the process of language coordination). The findings indicate that the more such adoption occurs, the more deferential one is. The authors find that editors on Wikipedia tend to coordinate (language-wise) more with the administrators than with non-administrators. Furthermore, the study suggests that one’s ability to coordinate language has an impact on one’s chances to become an administrator: the admin-candidates who do more language coordination have a higher chance of becoming an administrator than those who don’t change their language. Once a person is elected an administrator, they tend to coordinate less.

A blog post on the website of Technology Review summarized the results using the headline “Algorithm Measures Human Pecking Order” and highlighted the fact that one of the authors is Jon Kleinberg, known as inventor of the HITS algorithm (also known as “hubs and authorities”).

Can Wikipedia replace commercial biography databases?

California State University, East Bay: Could it rely on biographical information from Wikipedia and the web alone?

An article[2] by a librarian and professor at California State University offers a comparison of “biographical content for literary authors writing in English” between Wikipedia, “the web” (i.e. top Google search results) and two commercial databases: the Biography Reference Bank (BRB, now part of EBSCO Industries) and Contemporary Authors Online, motivated by the decision of the author’s institution to cancel its subscription to the latter database (CAO) during a budget crisis in 2008-2009, which among other reasons had been accompanied by “a comment that this information is ‘on the web’”.

The paper starts out with a literature review on the reliability of Wikipedia and then describes how the author compiled a list of 500 authors (mostly from the US and UK) by “examining curricula and textbooks from English literature courses across the USA” and soliciting additional suggestions from peers. These names were then searched on BRB, CAO (as part of the Literature Resource Center), Wikipedia and Google.

(more…)

Wikimedia Research Newsletter, December 2011

WRN header.png

Vol: 1 • Issue: 6 • December 2011 [archives] Syndicate the Wikimedia Research Newsletter feed

Psychiatrists: Wikipedia better than Britannica; spell-checking Wikipedia; Wikipedians smart but fun; structured biological data

With contributions by: Tbayer, DarTar and Jodi.a.schneider

Contents

Mental health information on Wikipedia more accurate than Britannica and Kaplan & Sadock psychiatry textbook

Wikipedia articles on schizophrenia and other mental health topics were assessed for accuracy, richness of references and readability.

In an article for Psychological Medicine,[1] ten researchers from the University of Melbourne conclude that “the quality of information on depression and schizophrenia on Wikipedia is generally as good as, or better than, that provided by centrally controlled websites, Encyclopaedia Britannica and a psychiatry textbook.”

The study focused on ten mental health topics (e.g. “antidepressants and suicide in young people” or “side-effects of antipsychotics”), five each in the areas of depression and schizophrenia. “Using the topic terms (or synonyms) as key words for the searches or through manual browsing, content relating to these topics was extracted from [Wikipedia and 13 other websites selected for prominent Google results for depression and schizophrenia] and from the most recent edition of Kaplan & Sadock’s Comprehensive Textbook of Psychiatry … and the online version of Encyclopaedia Britannica” by two reviewers. For both depression and schizophrenia, three psychologists with clinical and research expertise in that area evaluated these extracts on accuracy, up-to-dateness, breadth of coverage, referencing and readability, on a scale from 1 to 5 (“e.g. Accuracy: 1 = many errors of fact or unsubstantiated opinions, 3=some errors of fact or unsubstantiated opinions, 5 = all information factually accurate”). As in an earlier study of the quality of health information on Wikipedia (Signpost coverage: “Wikipedia’s cancer coverage is reliable and thorough, but not very readable“), readability was also measured using a Flesch–Kincaid readability test, which is calculated from word and sentence lengths.

For both depression and schizophrenia, Wikipedia scored highest in the accuracy, up-to-dateness, and references categories – surpassing all other resources, including WebMD, NIMH, the Mayo Clinic and Britannica online. In breadth of coverage, it was behind Kaplan & Saddock and others for both areas. And “of the online resources, Wikipedia was rated the least readable [by the human reviewers], although some of its topics received an average rating.” Likewise, the Wikipedia content had relatively high Flesch–Kincaid Grade Level indices (around 16 for schizophrenia and 15 for depression – indicating that a tertiary level of education is necessary to understand the content), similar to that of Britannica but higher than most other resources examined.

The authors note that their “findings largely parallel those of other recent studies of the quality of health information on Wikipedia” (citing eight such studies published between 2007 and 2010):

“Despite variability in the methodologies and conclusions of these studies, the overall implication is that Wikipedia articles on health topics typically contain relatively few factual errors, although they may lack breadth of coverage. … Given the number of patients, would-be patients and concerned others using the internet to search for information on health issues, it seems that Wikipedia is an appropriate recommendation as an information source.

Psychologists gauge impact of Wikipedia’s Rorschach test coverage

(more…)

Wikimedia Research Newsletter, November 2011

WRN header.png

Vol: 1 • Issue: 5 • November 2011 [archives] Syndicate the Wikimedia Research Newsletter feed

Quantifying quality collaboration patterns, systemic bias, POV pushing, the impact of news events, and editors’ reputation

With contributions by: Tbayer, Hfordsa, DarTar and Romanesco

Contents

Collaboration pattern analysis: Editor experience more important than “many eyes”

One of the motifs indicating article quality: One editor (top) having worked on several related articles (bottom)

A paper titled “Characterizing Wikipedia Pages Using Edit Network Motif Profiles”[1] by three researchers from University College Dublin indicates that the quality of a Wikipedia article can be predicted from characteristics of its “edit network” – a graph derived from the collaboration of Wikipedians in that area. Network motifs are small graphs which occur particularly frequently as sub-graphs of networks of a certain kind, and can be regarded as its building blocks in some sense. (The concept is popular in bioinformatics, where it is applied to gene regulatory networks.) In this paper, the authors use graphs with at most five nodes consisting of users and articles, which are connected by an edge if the user has edited the article – giving 17 possible “Wikipedia network motifs”. (Anonymous users are disregarded.) For a Wikipedia article, the researchers form an “ego network” consisting of that article, articles which link to it (and have been edited by at least one of the users who edited the core article), and the users who edited them. For a sample of around 2000 articles from the History and United States categories, the frequencies of the 17 “Wikipedia network motifs” in those article’s “ego networks” were calculated.

Using machine learning techniques, the researchers are able to discern with some certainty articles of basic quality (defined as having been assessed as Start class by Wikipedians) from those of good quality (defined as Featured or B class), solely based on this set of motif frequencies in the article’s edit network. Looking at the impact of each of the 17 types separately, they found that “all network motifs have some potential to discriminate between good and basic Wikipedia articles” in the sample, but that among the four best predicting motifs, three are “stars with editors at their centre”:

“This is interesting because it shows that many eyes is not really the defining characteristic of quality; instead experience is important – the editors should have worked on many other articles.”

(more…)

Wikimedia Research Newsletter, October 2011

WRN header.png

Vol: 1 • Issue: 4 • October 2011 [archives]

WikiSym; predicting editor survival; drug information found lacking; RfAs and trust; Wikipedia’s search engine ranking justified

With contributions by: Boghog, Jodi.a.schneider, Drdee, DarTar, Phoebe and Tbayer

Contents

Wiki research beyond the English Wikipedia at WikiSym

Panel discussion at WikiSym 2011

WikiSym 2011, the “7th international symposium on wikis and open collaboration”, took place from October 3-5 at the Microsoft Research Campus in Silicon Valley (Mountain View, California). Although the conference’s scope has broadened to include the study of open online collaborations that are not wiki-based, Wikipedia-related research still took up a large part of the schedule. Several of the conference papers have already been reviewed in the September and August issues of this research overview, and the rest of the proceedings have since become available online.

(more…)

Wikimedia Research Newsletter, September 2011

WRN header.png

Vol: 1 • Issue: 3 • September 2011 [archives]

Top female Wikipedians, reverted newbies, link spam, social influence on admin votes, Wikipedians’ weekends, WikiSym previews

With contributions by: Tbayer, Daniel Mietchen, DarTar and Jodi.a.schneider

Contents

(more…)