# Wikimedia Research Newsletter, February 2013

Vol: 3 • Issue: 2 • February 2013

Wikipedia not so novel after all, except to UK university lecturers; EPOV instead of NPOV

With contributions by: Piotr Konieczny, Taha Yasseri, Heather Ford, Sage Ross, Daniel Mietchen and Tilman Bayer.

### Wikipedia in historic context: “Stigmergic accumulation” is not new

Page with the entry Encyclopédie from Diderot and D’Alembert’s Encyclopédie. The work was the result of the collaboration of more than 100 contributors.

“Wikipedia and Encyclopedic Production”[1] by Jeff Loveland (a historian of encyclopedias) and Joseph Reagle situates Wikipedia within the context of encyclopedic production historically, arguing that the features that many claim to be unique about Wikipedia actually have roots in encyclopedias of the past. Loveland and Reagle criticize characterizations of Wikipedia that they believe to be ahistorical and exaggerated, laying special blame with authors who compare Wikipedia’s anonymous production to Encyclopedia Britannica’s production by named experts, and thus ignore the rich tradition of encyclopedic production through the centuries. The authors then set about characterizing the history of encyclopedic production as composed of three overlapping forms: compulsive collection, stigmergic accumulation, and corporate production.

‘Compulsive collection’ refers to the work of compiling encyclopedias that has traditionally been done by a few dedicated, tireless, detail-oriented individuals. Loveland and Reagle point out that, although Wikipedians share this compulsive behavior with past encyclopedists, the crucial distinction lies in the fact that the vast majority were motivated by money (even if this motive existed alongside more idealistic motivations) whereas Wikipedia editors are unpaid.

Loveland and Reagle use the term ‘stigmergic accumulation’ to refer to the process of production by accretion onto a previous text. Even those responsible for a singly authored encyclopedia were relying on predecessors, the authors argue, ‘building on their work and using the cumulative character of texts and knowledge as a ladder of sorts’. Examples of existing texts included the use of a previous edition of an encyclopedia that ran into multiple editions, and the practice of borrowing between different encyclopedias that was sometimes illegal but more often viewed as ‘piratical’ i.e. morally wrong.

The category of ‘corporate production’ is used by Loveland and Reagle to describe the process of encyclopedic editing by a group – groups that topped a thousand contributors in the 20th century. Editors of early encyclopedias like Diderot and D’Alembert’s Encyclopédie in the 1700s faced the challenge of trying to coordinate the contributions of about 140 contributors in a similar way to Wikipedia having to confront issues of consistency that result in debates about how important a subject must be to merit an article. In contrast to other encyclopedias, write Loveland and Reagle, Wikipedia settles these debates through community decision-making and in the open. The authors also note that previous encyclopedias didn’t always recruit on the basis of expertise and that some recognized that it would be cheaper and sometimes more accurate to have non-experts summarizing the works of experts.

# A year’s worth of Wikipedia research

Twelve years after its launch, Wikipedia continues to attract a large amount of attention from scholarly research trying to understand what made this one of the most remarkable collaborative efforts in history and what makes it work. Researchers have called Wikipedia “our Everest” (because of its complexity and cultural importance) or “the Drosophila (fruit fly) of social software” (because the project’s transparency and freely available data make it accessible and popular as a research subject).

Download the complete Volume 2 (PDF)

In 2011, we launched a monthly Wikimedia Research Newsletter with the aim of covering recent academic research about Wikipedia and other Wikimedia projects. Published jointly by the Wikimedia Research Committee and the Signpost (the English Wikipedia’s community-edited newspaper), it has established itself as a comprehensive outlet enabling both researchers and Wikipedians to stay on top of current research, aiming to facilitate exchange between these two communities.

Today we are announcing the release in the public domain of a curated corpus containing the bibliographic references of all 225 publications reviewed or covered in the second volume of the newsletter, forming a historical record of Wikipedia research in the year 2012. This corpus can be browsed online or downloaded, ready to be imported into reference managers or other literature collections. Papers in this dataset have been marked as either open access or closed access .

Last year, we published a similar dataset for volume 1 (2011). Together, these releases complement other efforts to catalogue the research literature on Wikipedia, in particular the WikiLit project which focuses on publications until June 2011, prior to the launch of the newsletter.

A year ago we launched the @WikiResearch news feed on Twitter and Identi.ca, covering new preprints, papers or research-related blog posts, before they are reviewed more fully in the Newsletter. As of February 2013, it has gained 745 followers and continues to be actively updated.

We also started offering the newsletter in form of an HTML email newsletter (in addition to the announcements of each new issue on the Wikiresearch-l mailing list, which only contain the table of contents). This experiment proved successful, too, with almost 100 subscribers to date (adding to the thousands of pageviews each issue receives when published as part of the Signpost, on Meta-wiki and on this blog). You can sign up to receive a copy of each new issue in your inbox as soon as it comes out.

The Newsletter is a collaborative effort and would not exist without the following 22 people who contributed reviews and summaries in 2012:

More than half of our contributors are researchers themselves, who have published about Wikipedia in peer-reviewed publications. We are also grateful for the help of several Signpost collaborators in copyediting and preparing the final publication every month.

Finally, thanks to everyone for reading the Wikimedia Research Newsletter, and please
consider contributing by pointing us to new research we should cover, or by volunteering to review new publications.

The editors of the Wikimedia Research Newsletter:

Tilman Bayer, Senior Operations Analyst
Dario Taraborelli, Senior Research Analyst

# Suggesting tasks for new Wikipedians

If you had just signed up to become a Wikipedia contributor, what kind of experience would you like to have? Would you know exactly where to get started, or would you prefer some suggestions?

For most of Wikipedia’s 12-year history, we have done very little to proactively introduce new participants to tasks that are interesting and easy. Right after account creation, for instance, we merely suggest that you check out your preferences. If you look around, you can find guides like Wikipedia:Tutorial. Most of this documentation is focused on the rules and mechanics of how to contribute, rather than suggesting real tasks to try immediately.

Naturally, the kind of people who have tended to thrive in this environment already know what they want to contribute, or are deeply motivated to go and find it. Unless you’ve spotted an error or a missing piece of information, there is little pointing you in the right direction. That lack of direction is a big part of why only about a quarter of all newly-registered accounts complete an edit.

This phenomenon is far from unique to the site, and in fact it would be surprising to hear of any site where 100% of signups become devoted content contributors. However, when considering the enormous workload we face, the sheer waste of human capital is staggering. In English Wikipedia alone, there are…

• more than 200,000 “citation needed” tags
• 3,000 articles that need basic copyediting
• over 14,000 pages that need more wiki links

The list goes on, and these are just the items that have been explicitly added to the backlog. Wikipedia is in fact bursting at the seams with small problems that need fixing.

So how do we match the thousands of people who sign up every day, eager and willing to help, with tasks that are easy to do? That’s the question we’re attempting to solve with our work onboarding new Wikipedians, at the Wikimedia Foundation’s Editor Engagement Experiments team.

# Vote for the most exciting paper from nine years of research about Wikipedia

(This is a guest post by Carol Ann O’Hare of Wikimedia France.)

The impact of collaborative writing on the quality of Wikipedia content, new methods for monitoring contributions in order to fight vandalism, how the nature and quality of content depends on contributors’ status and the area covered, etc. These topics concern the Wikimedians who write and use Wikipedia… but also more and more researchers!

By launching an international award for research on Wikimedia projects and free knowledge, Wikimédia France wants to highlight these research works, encourage them and especially, make them understandable and accessible to the Wikimedia community.

Starting in July, the first step was to ask the community of researchers that study Wikimedia projects to nominate scientific papers that they consider the most influential and important from the years 2003 to 2011. We collected more than 30 proposals, each satisfying the selection criteria: Available under open access and published in peer-reviewed publications. It is thanks to a quality jury, composed of researchers working on these topics, that we could select five finalists papers among these. You can find summaries and full texts linked below:

To decide the winner, Wikimédia France wishes to encourage all Wikimedians to give their opinion and vote for the paper that seems the most stimulating and relevant.

Voting will close on Monday, March 11. The announcement of the winning paper is scheduled for the end of March. The authors will receive a grant of €2,500. They can freely allocate this sum, provided it is dedicated to help open knowledge research.

Carol Ann O’Hare
Wikimedia France

# Wikimedia Research Newsletter, January 2013

Vol: 3 • Issue: 1 • January 2013

Lessons from the research literature on open collaboration; clicks on featured articles; credibility heuristics

With contributions by: Taha Yasseri, Piotrus, Aaron Shaw, Tbayer and Lui8E

### Lessons from the wiki research literature in “American Behavioral Scientist” special issue

A special issue of the American Behavioral Scientist is devoted to “open collaboration”.

• Consistent patterns found in Wikipedia and other open collaborations: In the introductory piece[1], researchers Andrea Forte and Cliff Lampe give an overview of this field, defined as the study of “distributed, collaborative efforts made possible because of changes in information and communication technology that facilitate cooperative activities” – with open source projects and Wikipedia among the most prominent examples. They point out that “[b]y now, thousands of scholars have written about open collaboration systems, many hundreds of thousands of people have participated in them, and millions of people use products of open collaboration every day.” Among their “lessons from the literature”, they name three “consistent patterns” found by researchers of open collaborations:
• “Participation Is Unequal” (meaning that some participants contribute vastly more than others: “In Wikipedia, for example, it has long been shown that a few editors provide the bulk of contributions to the site.”)
• “There Are Special Requirements for Socializing New Users”
• “Users Are Massively Heterogeneous in Both How and Why They Participate”
• “Ignore All Rules” as “tension release mechanism”: The abstract of paper titled “Rules and Roles vs. Consensus: Self-Governed Deliberative Mass Collaboration Bureaucracies” [2] explains “Wikipedia’s unusual policy, ignore all rules (IAR)” as a “tension release mechanism” that is “reconciling the tension between individual agency and collective goals” by “[supporting] individual agency when positions taken by participants might conflict with those reflected in established rules. Hypotheses are tested with Wikipedia data regarding individual agency, bureaucratic processes, and IAR invocation during the content exclusion process. Findings indicate that in Wikipedia each utterance matters in deliberations, rules matter in deliberations, and IAR citation magnifies individual influence but also reinforces bureaucracy.”
• Collaboration on articles about breaking news matures more quickly: “Hot Off the Wiki: Structures and Dynamics of Wikipedia’s Coverage of Breaking News Events”[3] analyzes “Wikipedia articles about over 3,000 breaking news events, [investigating] the structure of interactions between editors and articles”, finding that “breaking articles emerge into well-connected collaborations more rapidly than nonbreaking articles, suggesting early contributors play a crucial role in supporting these high-tempo collaborations.” (see also our earlier review of a similarly-themed paper by the same team: “High-tempo contributions: Who edits breaking news articles?“)

A fourth paper in this special issue, titled “The Rise and Decline of an Open Collaboration System: How Wikipedia’s Reaction to Popularity Is Causing Its Decline”, found considerable media attention this month, starting with an article in USA Today. It was already reviewed in the September issue of the research report.

# Wikimedia Research Newsletter, December 2012

Vol: 2 • Issue: 12 • December 2012

Wikipedia and Sandy Hook; SOPA blackout reexamined

With contributions by: Daniel Mietchen, Piotrus, Junkie.dolphin, Taha Yasseri, Benjamin Mako Hill, Aaron Shaw, Tbayer, DarTar and Ragesoss

### How Wikipedia deals with a mass shooting

Northeastern University researcher Brian Keegan analyzed the gathering of hundreds of Wikipedians to cover the Sandy Hook Elementary School shooting in the immediate aftermath of the tragedy. The findings are reported in a detailed blog post that was later republished by the Nieman Journalism Lab.[1] Keegan observes that the Sandy Hook shooting article reached a length of 50Kb within 24 hours of its creation, making it the fastest growing article by length in the first day among recent articles covering mass shootings on the English-language Wikipedia. The analysis compares the Sandy Hook page with six similar articles from a list of 43 articles on shooting sprees in the US since 2007. Among the analyses described in the study, of particular interest is the dynamics of dedicated vs occasional contributors as the article reaches maturity: while in the first few hours contributions are evenly distributed with a majority of single-edit editors, after hour 3 or 4 a number of dedicated editors show up and “begin to take a vested interest in the article, which is manifest in the rapid centralization of the article”. A plot of inter-edit time also shows the sustained frequency of revisions that these articles display days after their creation, with Sandy Hook averaging at about 1 edit/minute around 24 hours since its first revision. The notebook and social network data produced by the author for the analysis are available on his website. The Nieman Journalism Lab previously covered the role that Wikipedia is playing as a platform for collaborative journalism, and why its format outperforms Wikinews with an interview of Andrew Lih published in 2010.[2] The early revision history of the Sandy Hook shooting article was also covered in a blog post by Oxford Internet Institute fellow Taha Yasseri, however with a focus on the coverage in different Wikipedia language editions.[3]

### Network positions and contributions to online public goods: the case of the Chinese Wikipedia

A graph with nodes color-coded by betweenness centrality (from red=0 to blue=max).

In a forthcoming paper in the Journal of Management Information Systems (presented earlier at HICSS ’12[4]), Xiaoquan (Michael) Zhang and Chong (Alex) Wang use a natural experiment to demonstrate that changes to the position of individuals within the editor network of a wiki modify their editing behavior. The data for this study came from the Chinese Wikipedia. In October 2005, the Chinese government suddenly blocked access to the Chinese Wikipedia from mainland China, creating an unanticipated decline in the editor population. As a result, the remaining editors found themselves in a new network structure and, the authors claim, any changes in editor behavior that ensued are likely effects of this discontinuous “shock” to the network. (more…)

# Wikimedia Research Newsletter, November 2012

Vol: 2 • Issue: 11 • November 2012

Movie success predictions, readability, credentials and authority, geographical comparisons

With contributions by: Piotrus, Benjamin Mako Hill, Tbayer, DarTar, Adler.fa, Hfordsa, Drdee

### Early prediction of movie box-office revenues with Wikipedia data

An open-access preprint[1] has announced the results from a study attempting to predict early box-office revenues from Wikipedia traffic and activity data. The authors – a team of computational social scientists from Budapest University of Technology and Economics, Aalto University and the Central European University – submit that behavioral patterns on Wikipedia can be used for accurate forecasting, matching and in some cases outperforming the use of social media data for predictive modeling. The results, based on a corpus of 312 English Wikipedia articles on movies released in 2010, indicate that the joint editing activity and traffic measures on Wikipedia are strong predictors of box-office revenue for highly successful movies.

The authors contrast their early prediction approach with more popular real-time prediction/monitoring methods, and suggest that movie popularity can be accurately predicted well in advance, up to a month before the release. The study received broad press coverage and was featured in The Guardian, the MIT Technology Review and the Hollywood Reporter among others. The authors observe that their approach, being “free of any language based analysis, e.g., sentiment analysis, could be easily generalized to non-English speaking movie markets or even other kinds of products”. The dataset used for this study, including the financial and Wikipedia activity data is available among the supplementary materials of the paper.

### Readability of the English Wikipedia, Simple Wikipedia, and Britannica compared

$
4.71 \left (\frac{\mbox{characters}}{\mbox{words}} \right) + 0.5 \left (\frac{\mbox{words}}{\mbox{sentences}} \right) - 21.43
$

The automated readability index, one of the readability metrics used in the study[2]

A study[2] by researchers at Kyoto University presents a detailed assessment of the readability of the English Wikipedia against Encyclopedia Britannica and the Simple English Wikipedia using a series of readability metrics and finds that Wikipedia “seems to lag behind the other encyclopedias in terms of readability and comprehensibility of its content”. (more…)

# In divisive times, Wikipedia brings political opponents together

Network of users communicating on Wikipedia article talk pages (Neff et al., p.22). Edges connecting two Democrats are colored blue, edges connecting two Republicans in red, and edges representing inter-party dialogue are colored in green.

Neutral Point of View – the requirement that articles must represent all significant viewpoints fairly – is one of the three core principles that Wikipedia is based on. Many of its readers value it, especially when seeking unbiased information in times of heated political battles.

In the run-up to the US presidential election, a group of six researchers from the University of Southern California and the Barcelona Media Foundation have published the results of a new study[1] showing that “despite the increasing political division of the U.S., there are still areas in which political dialogue is possible and happens” – namely, the talk pages of Wikipedia, where users of both political persuasions debate and collaborate to create encyclopedic coverage of political topics.

The research project–presented earlier this year at the 32nd INSNA (International Network for Social Network Analysis) Sunbelt conference and now documented in preprint form–conducted a quantitative analysis of the interactions of Wikipedia users who had proclaimed a political affiliation on their user page, in terms of the US political system. As the researchers write in the abstract:

“In contrast to previous analyses of other social media, we did not find strong trends indicating a preference to interact with members of the same political party within the Wikipedia community. … It seems that the shared identity of ‘being Wikipedian’ may be strong enough to triumph over other potentially divisive facets of personal identity, such as political affiliation.”

The paper’s title, “Jointly they edit,” was chosen in reference to the well-known phrase “divided we blog” coined in a 2005 paper that referred “to a trend of cyberbalkanization in the political blogosphere, with liberal and conservative blogs tending to link to other blogs with a similar political slant, and not to one another.” A similar divisive trend was found in the retweet networks on Twitter.

As a testament to what can be achieved in a fruitful collaboration between many editors including opposing political persuasions, Wikipedians have brought the articles about both contenders in tomorrow’s presidential election to “featured article” status, representing the highest quality rating on Wikipedia. The article Barack Obama has received more than 22,000 edits since it was started in March 2004, and its information is currently supported by 319 inline references. The article Mitt Romney, begun in January 2004, has been edited over 10,000 times and currently contains 400 inline references.

So no matter who gets the most electoral votes tomorrow, you can trust that many Wikipedians have worked together to ensure that his Wikipedia page will reflect a balanced political perspective.

### Reference

1. Neff J. G., Laniado D., Kappler K., Volkovich Y., Aragon P., Kaltenbrunner A. (2012). Jointly they edit: Examining the impact of community identification on political interaction in Wikipedia. arXiv:1210.6883

For more coverage of recent academic research on Wikipedia, read our monthly Wikimedia Research Newsletter or follow its updates on Twitter and Identi.ca.

# Wikimedia Research Newsletter, October 2012

Vol: 2 • Issue: 10 • October 2012

WP governance informal; community as social network; efficiency of recruitment and content production; Rorschach news

With contributions by: Piotrus, Adler.fa, Bdamokos, Ragesoss, Tbayer, and Phoebe

### Wikipedia governance found to be mostly informal

A paper in the Journal of the American Society for Information Science and Technology, coming from the social control perspective and employing the repertory grid technique, has contributed interesting observations about the governance of Wikipedia.[1] The paper begins with a helpful if cursory overview of governance theories, moving towards the governance of open source communities and Wikipedia. That cursory treatment is not foolproof, though: for example, the authors mention “bazaar style governance”, but attribute it incorrectly—rather than the 2006 work they cite, the coining of this term dates to Eric S. Raymond‘s 1999 The Cathedral and the Bazaar. The authors have interviewed a number of Wikipedians and identified a number of formal and informal governance mechanisms. Only one formal mechanism was found important—the policies—while seven informal mechanisms were deemed important: collaboration among users, discussions on article talk pages, facilitation by experienced users, individuals acting as guardians of the articles, inviting individuals to participate, large numbers of editors, and participation by highly reputable users. Notably, the interviewed editors did not view elements such as administrator involvement, mediation or voting as important.

The paper concludes that “in the everyday practice of content creation, the informal mechanisms appear to be significantly more important than the formal mechanisms”, and note that this likely means that the formal mechanisms are used much more sparingly than informal ones, most likely only in the small percentage of cases where the informal mechanisms fail to provide an agreeable solution for all the parties. It was stressed that not all editors are equal, and certain editors (and groups) have much more power than others, a fact that is quickly recognized by all editors. The authors note the importance of transparent interactions in spaces like talk pages, and note that “the reported use of interaction channels outside the Wikipedia platform (e.g., e-mail) is a cause for concern, as these channels limit involvement and reduce transparency.” Citing Ostrom’s governance principles, they note that “ensuring participation and transparency is crucial for maintaining the stability of self-governing communities.”

# Wikimedia Research Newsletter, September 2012

Vol: 2 • Issue: 9 • September 2012

“Rise and decline” of Wikipedia participation, new literature overviews, a look back at WikiSym 2012

With contributions by: Piotrus, Phoebe, DarTar, Benjamin Mako Hill, Ragesoss and Tbayer

### “The rise and decline” of the English Wikipedia

A paper to appear in a special issue of American Behavioral Scientist (summarized in the research index) sheds new light on the English Wikipedia’s declining editor growth and retention trends. The paper describes how “several changes that the Wikipedia community made to manage quality and consistency in the face of a massive growth in participation have lead to a more restrictive environment for newcomers”.[1] The number of active Wikipedia editors has been declining since 2007 and research examining data up to September 2009[2] has shown that the root of the problem has been the declining retention of new editors. The authors show this decline is mainly due to a decline among desirable, good-faith newcomers, and point to three factors contributing to the increasingly “restrictive environment” they face.