Wikimedia blog

News from the Wikimedia Foundation and about the Wikimedia movement

Wikimedia Research Newsletter

Wikimedia Research Newsletter, March 2014

Wikimedia Research Newsletter
Wikimedia Research Newsletter Logo.png

Vol: 4 • Issue: 3 • March 2014 [contribute] [archives] Syndicate the Wikimedia Research Newsletter feed

Wikipedians’ “encyclopedic identity” dominates even in Kosovo debates; analysis of “In the news” discussions; user hierarchy mapped

With contributions by: Federico Leva, Scott Hale, Kim Osman, Jonathan Morgan, Piotr Konieczny, Niklas Laxström, Tilman Bayer and James Heilman

Cross-language study of conflict on Wikipedia

Have you wondered about differences in the articles on Crimea in the Russian, Ukrainian, and English versions of Wikipedia? A newly published article entitled “Lost in Translation: Contexts, Computing, Disputing on Wikipedia”[1] doesn’t address Crimea, but nonetheless offers insight into the editing of contentious articles in multiple language editions through a heavy qualitative examination of Wikipedia articles about Kosovo in the Serbian, Croatian, and English editions.

The authors, Pasko Bilic and Luka Bulian from the University of Zagreb, found the main drivers of conflict and consensus were different group identities in relation to the topic (Kosovo) and to Wikipedia in general. Happily, the authors found the dominant identity among users in all three editions was the “encyclopedic identity,” which closely mirrored the rules and policies of Wikipedia (e.g., NPOV) even if the users didn’t cite such policies explicitly. (This echoes the result of a similar study regarding political identities of US editors, see previous coverage: “Being Wikipedian is more important than the political affiliation“.) Other identities were based largely on language and territorial identity. These identities, however, did not sort cleanly into the different language editions: “language and territory [did] not produce coherent and homogeneous wiki communities in any of the language editions.”

The English Wikipedia was seen by many users as providing greater visibility and thus “seem[ed] to offer a forum for both Pro-Serbian and Pro-Albanian viewpoints making it difficult to negotiate a middle path between all of the existing identities and viewpoints.” The Arbitration Committee, present in the English edition but not in the Serbian or Croatian editions, may have helped prevent even greater conflict. Enforcement of its decisions seemed generally to lead to greater caution in the edition process.

In line with previous work showing some users move between language editions, the authors found a significant amount of coordination work between the language editions. One central focus centered around whether other editions would follow the English edition in breaking the article into two separate articles (Republic of Kosovo and Autonomous Province of Kosovo and Metohija).

The social construction of knowledge on English Wikipedia

review by Kim Osman

Another paper by Bilic, published in New Media & Society[2] looks at the logic behind networked societies and the myth perpetuated by media institutions that there is a center of the social world (as opposed to distributed nodes). The paper goes on to investigate the social processes that contribute to the creation of “mediated centers”, by analyzing the talk pages of English Wikipedia’s In The News (ITN) section.

Undertaking an ethnographic content analysis of ITN talk pages from 2004–2012, Bilic found three issues that were disputed among Wikipedians in their efforts to construct a necessarily temporal section of the encyclopedia. First, that editors differentiate between mass media and Wikipedia as a digital encyclopedia, however what constitutes the border between the two is often contested. Second, there was debate between inclusionists and deletionists regarding the criteria for stories making the ITN section. Third, conflict and discussion occurred regarding English Wikipedia’s relevance to a global audience.

The paper provides a good insight into how editors construct the ITN section and how it is positioned on the “thin line between mass media agenda and digital encyclopedia.” It would be interesting to see further research on the tensions between the Wikipedia policies mentioned in the paper (e.g. WP:NOTNEWS, NPOV) and mainstream media trends in light of other studies about Wikipedia’s approach to breaking news coverage.

User hierarchy map: Building Wikipedia’s Org Chart


Wikimedia Research Newsletter, February 2014

Wikimedia Research Newsletter
Wikimedia Research Newsletter Logo.png

Vol: 4 • Issue: 2 • February 2014 [contribute] [archives] Syndicate the Wikimedia Research Newsletter feed

CSCW ’14 retrospective; the impact of SOPA on deletionism; like-minded editors clustered; Wikipedia stylistic norms as a model for academic writing

With contributions by: David Ludwig, Morten Warncke-Wang, Maximilian Klein, Piotr Konieczny, Giovanni Luca Ciampaglia, Dario Taraborelli and Tilman Bayer

CSCW ’14 retrospective

The 17th ACM Conference on Computer-supported cooperative work and Social Computing (CSCW ’14) took place this month in Baltimore, Maryland.[supp 1] The conference brought together more than 500 researchers and practitioners from industry and academia presenting research on “the design and use of technologies that affect groups, organizations, communities, and networks.” Research on Wikipedia and wiki-based collaboration has been a major focus of CSCW in the past. This year, three papers on Wikipedia were presented:

Unique editors per quarter in conventional and alternative WikiProjects, 2002-2012

Edits per quarter in conventional and alternative WikiProjects, 2002-2012

Slides from Editing beyond articles[1]

The rise of alt.projects in Wikipedia

Jonathan Morgan from the Wikimedia Foundation and collaborators from the University of Washington[1] analyzed the nature of collaboration in alternative WikiProjects, i.e. projects that the authors identify as not following “the conventional pattern of coordinating a loosely defined range of article creation and curation-related activities within a well defined topic area” (examples of such alternative WikiProjects include the Guild of Copy Editors or WikiProject Dispute Resolution). The authors present an analysis of editing activity by members of these projects that are not focused on topic content editing. The paper also reports data on the number of contributors involved in WikiProjects over time: while the number of editors participating in conventional projects decreased by 51% between 2007 and 2012, participation in alternative projects only declined by 13% in the same period and saw an overall 57% increase in the raw number of contributions.

Categorizing barnstars via Mechanical Turk

Paul Andre and collaborators from Carnegie Mellon University presented a study showing how to effectively crowdsource a complex categorization task by assigning it to users with no prior knowledge or domain expertise.[2] The authors selected a corpus of Wikipedia barnstars and showed how different task designs can produce crowdsourced judgments where Mechanical Turk workers accurately match expert categorization. Expert categorization was obtained by recruiting two Wikipedians with substantial editing activity as independent raters.

Understanding donor behavior through email

A team of researchers from Yahoo! Research, the Qatar Computing Research Institute and UC Berkeley analyzed two months of anonymized email logs to understand the demographics, personal interests and donation behavior of individuals responding to different fundraising campaigns.[3] The results include donation email from the Wikimedia Foundation and indicate that among other campaigns, email from a domain had the highest score of messages tagged for spam over total messages read, which the authors attribute to spoofing. The paper also indicates that the Wikimedia fundraiser tends to attract slightly more male than female donors.

Clustering Wikipedia editors by their biases

review by User:Maximilianklein

Building on the streams of rating editors by content persistence and algorithmically finding cliques of editors, Nakamura, Suzuki and Ishikawa propose[4] a sophisticated tweak to find like- and disparate-minded editors, and test it against the Japanese Wikipedia. The method works by finding cliques in a weighted graph between all editors of an article and weighting the edges by the agreement or disagreement between editor. To find the agreement between two editors, they iterate through the full edit history and use the content persistence axioms of interpreting edits that are leaving text unchanged as agreement, and deleting text as disagreement. Addressing that leaving text unchanged is not always a strong indication of agreement, they normalize by each action’s frequency of both the source editor and the target editor. That is, the method accounts for the propensity of an editor to change text, and the propensity of editors to have their text changed.

To verify their method, its results are compared to a simplified weighting scheme, random clustering, and human-clustered results on 7 articles in Japanese Wikipedia. In 6 out of 7 articles, the proposed technique beats simplified weighting. An example they present is their detection of pro- and anti-nuclear editors on the Nuclear Power Plant article. An implication of such detection would be a gadget that colours text of an article depending on which editor group wrote it.

Monthly research showcase launched

Video of the February 2014 Research Showcase

The lifetime of deleted articles by year of creation

The Wikimedia Foundation’s Research & Data team announced its first public showcase, a monthly review of work conducted by researchers at the Foundation. Aaron Halfaker presented a study of trends in newcomer article creation across 10 languages with a focus on the English and German Wikipedias (slides). The study indicates that in wikis where anonymous users can create articles, their articles are less likely to be deleted than articles created by newly registered editors. Oliver Keyes presented an analysis of how readers access Wikipedia on mobile devices and reviewed methods to identify the typical duration of a mobile browsing session (slides). The showcase is hosted at the Wikimedia Foundation every 3rd Wednesday of the month and live streamed on YouTube.[supp 2]

Study of AfD debates: Did the SOPA protests mellow deletionists?


Wikimedia Research Newsletter, January 2014

Wikimedia Research Newsletter
Wikimedia Research Newsletter Logo.png

Vol: 4 • Issue: 1 • January 2014 [contribute] [archives] Syndicate the Wikimedia Research Newsletter feed

Translation assignments, weasel words, and Wikipedia’s content in its later years

With contributions by: Aaron Halfaker, Jonathan Morgan, Piotr Konieczny and Tilman Bayer

Translation students embrace Wikipedia assignments, but find user interface frustrating

An article, “Translating Wikipedia Articles: A Preliminary Report on Authentic Translation Projects in Formal Translator Training”, [1] reports on the author’s experiment with “a promising type of assignment in formal translator training which involves translating and publishing Wikipedia articles”, in three courses with second- and third-year students at the Institute of English Studies, University of Warsaw.

It was “enthusiastically embraced by the trainees … Practically all of the respondents [in a participant survey] concluded that the experience was either ‘positive’ (31 people, 56% of the respondents) or ‘very positive’ (23 people, 42% of the respondents).” And “more than 90% of the respondents (50 people) recommended that the exercise ‘should definitely be kept [in future courses], maybe with some improvements,’ and the remaining 5 people (9%) cautioned that improvements to the format were needed before it was used again. No-one recommended culling the exercise from the syllabus.”

However, the author cautions that Polish–English translations required more instructor feedback and editing than translations from English into Polish (the students’ native language). And “most people found the technological aspects of the assignment frustrating, with most students assessing them as either ‘hard’ (39%) or ‘very hard’ (16%) to complete. (more…)

Wikimedia Research Newsletter, December 2013

Wikimedia Research Newsletter
Wikimedia Research Newsletter Logo.png

Vol: 3 • Issue: 12 • December 2013 [contribute] [archives] Syndicate the Wikimedia Research Newsletter feed

Cross-language editors, election predictions, vandalism experiments

With contributions by: Daniel Mietchen, Maximilian Klein, Piotr Konieczny and Tilman Bayer

Cohort of cross-language Wikipedia editors analyzed

Network graph of the cross-language Wikipedia edits analyzed in the study.

The same network, with the node for the English Wikipedia removed.

Analyzing edits to the then 46 largest Wikipedias between July 9 and August 8, 2013, a study[1] identified a set of about 8,000 contributors (labeled multilingual) with a global user account who have edited more than one of these language versions (excluding Simple English, which was treated separately) in that time frame. It tested five hypotheses about cross-language editing and editors and looked, for instance, at the proportion of contributions that any of these Wikipedias receives from multilingual editors versus contributions from those only editing one language version. The research found that Esperanto and Malay stick out with a high proportion of contributions from multilinguals, and on the other end, that Japanese has few contributions from multilinguals. Overall, in terms of edits per user, multilingual users made more than twice the number of contributions to the study corpus than monolinguals did; they often work on the same topics across language; and in any given language, they are frequently editing articles not edited by monolinguals during the one-month period analyzed here. They thus serve a bridging function between languages.

Two existing write-ups are good starting points to putting the study in context.[supp 1][supp 2] In the long run, it would be interesting to extend the research to (a) cover a longer time span, (b) include contributions from non-registered users, despite technical difficulties, (c) include smaller Wikipedias, and (d) explore the effects of that bridging function in more detail, perhaps in search for ways to support its beneficial effects while minimizing the non-beneficial ones. It would also be interesting to focus on some aspects of those multilingual users (e.g. how do the languages they edit in match with the languages they display on their user pages) or their contributions (e.g. how do their contributions to text, illustrations, references, links, templates, categories or talk page discussions differ across languages, or how contributions from multilinguals differ across topics or between pages with high and low traffic – or to entertain ideas for a multilingual version of editing tools like User:SuggestBot. The paper is one of the first to make use of Wikidata; comparing such cross-lingual Wikipedia contributions with contributions to multi-lingual projects like Wikidata and Commons may also be a fruitful avenue for further research. (See also earlier coverage of a CSCW paper about a similar topic: “Activity of content translators on Wikipedia examined“)

Attempt to use Wikipedia pageviews to predict election results in Iran, Germany and the UK


Wikimedia Research Newsletter, November 2013

Wikimedia Research Newsletter
Wikimedia Research Newsletter Logo.png

Vol: 3 • Issue: 11 • November 2013 [contribute] [archives] Syndicate the Wikimedia Research Newsletter feed

Reciprocity and reputation motivate contributions to Wikipedia; indigenous knowledge and “cultural imperialism”; how PR people see Wikipedia

With contributions by: Piotr Konieczny, Brian Keegan, Nicolas Jullien, Amir E. Aharoni, Henrique Andrade, Tilman Bayer, Daniel Mietchen, Giovanni Luca Ciampaglia, Dario Taraborelli and Aaron Halfaker

What drives people to contribute to Wikipedia? Experiment suggests reciprocity and social image motivations

Wikipedia works on the efforts of unpaid volunteers who choose to donate their time to advance the cause of free knowledge. This phenomenon, as trivial as it may sound to those acquainted with Wikipedia inner workings, has always puzzled economists and social scientists alike, in that standard Economic theory would not predict that such enterprises (and any other community of peer production, for example open source software) would thrive without any form of remuneration. The flip-side of direct remuneration — passion, enthusiasm, belief in free knowledge, in short, intrinsic motivations — could not alone (at least as standard theory goes) convincingly explain such prolonged efforts, given essentially away for free.

Early on the dawn of the Open Source/Libre software movement, some economists noted that successfully contributing to high-profile projects like Linux or Apache may translate in a strong résumé for a software developer, and proposed, as a way to reconcile traditional economic theory with reality, that whereas other forms of extrinsic motivation are available, sustained contribution to a peer production system could happen. But what about Wikipedia? The career incentive is largely absent in the case of the Free Encyclopedia, and is it really the case that intrinsic motivation such as pure altruism cannot be really behind the prolonged efforts of its contributors?

To understand this, a group of researchers at Sciences Po, Harvard Law School, and University of Strasbourg (among others) designed a series of online experiments with the intent of measuring social preferences, and administered them to a group of volunteer Wikipedia editors to understand whether contribution to Wikipedia can be explained by any of the main hypotheses that economists have thus far formulated regarding contribution to public goods.[1][2] The researchers considered three hypotheses, two for intrinsic and one for extrinsic forms of motivation: pure altruism, reciprocity, and social image motives.

In more detail, the researchers asked a number of Wikipedia editors and contributors (all with a registered account) to participate in a series of experimental games specifically designed to measure the extent to which people behave according to one or more of the above social preferences — for example by either free-riding or contributing to the common pool in a public goods game. In addition to this, as a proxy measure for the “social image” hypothesis, they checked whether participants ever received a barnstar on their talk pages and whether they ever chose to display any of these on their user page (coding these individuals as “social signallers”). Finally, they matched each participant with their history of contribution of the participants, and sought to understand which of these measures can explain their edit counts.

The results suggest that reciprocity seems to be the driver of contribution for less experienced editors, whereas reputation (social image) seems to better explain the activity of the more seasoned editors, though, as the authors acknowledge, the goodness of fit of the regression estimates is not great. The study was at the center of a heated debate within the community about the usage of site-wide banners for recruitment purposes. On December 3, one of the authors gave a presentation about the results at Harvard, which is available online as an audio and video recording. According to the Harvard Crimson, he remarked “that the study is still in progress and more data needs to be collected”. The results are so far available in the form of a conference paper and as an unpublished working paper.

Does “cultural imperialism” prevent the incorporation of indigenous knowledge on Wikipedia?


Wikimedia Research Newsletter, October 2013

Wikimedia Research Newsletter
Wikimedia Research Newsletter Logo.png

Vol: 3 • Issue: 10 • October 2013 [contribute] [archives] Syndicate the Wikimedia Research Newsletter feed

User influence on site policies: Wikipedia vs. Facebook vs. YouTube

With contributions by: Han-Teng Liao, Piotr Konieczny, Taha Yasseri, and Tilman Bayer

User influence on site policies is highest on Wikipedia, compared to YouTube and Facebook

Laura Stein, a researcher at the University of Texas at Austin, has concluded[1] that, based on her comparison of user policy documents (including the Terms of Service) of YouTube, Facebook and Wikipedia, Wikipedia offers the highest level of participation power overall. Using Arnstein’s ladder of participation to begin a theoretical discussion on participation and power, Stein carefully proposed a typology of policy and participation (Table 1, p. 359), from the maximal power of “dominant control over site content and governance”, “shared control”, the minimal power of “consultation”, “choice”, and “informing”, to the no power of “deceptive or inadequate information” and “nonparticipation”. She applied this typology across the five policy areas: “permitted content and its use”, “content ownership/copyrights”, “user information/data”, “modifying software” and “user policy formation & consent”) for the three websites, and found that Wikipedia beats other websites in all areas. In the first and last policy areas of “permitted content and its use” and “user policy formation & consent”, Wikipedia gives users the “dominant control” of participation power; for the remaining areas, Wikipedia gives user “shared control over site content and governance”.

In contrast, YouTube and Facebook only provide the minimal power of “informing” in three policy areas: “permitted content and its use”, “content ownership/copyrights”, “modifying software” and provide slightly better minimal power of “choice” for the “user information/data area”. Although Wikipedia is not widely agreed on to be a “social media” website, Stein nevertheless presented a simple typology for evaluating the levels of participation power given to users by platforms. Also, it would be useful to apply this topology in other policy areas including fund dissemination and organizational governance in the near future.

Wikipedia’s coverage of academics

Histogram of h-indexes of scientists from four different disciplines featured in Wikipedia. The solid line shows the average considering all the researchers of the field.

Anna Samoilenko and Taha Yasseri from the Oxford Internet Institute released an arXiv preprint titled: “The distorted mirror of Wikipedia: a quantitative analysis of Wikipedia coverage of academics”.[2] In this study the notability of academics in the English Wikipedia is examined. The ground truth is taken to be the citation records of the scholars under study and the h-index in particular, although the authors admit that the quantity of publications and citations are not the best proxies for evaluating the quality and scientific impact of researchers. Based on the results of the paper, scientists covered in Wikipedia (which are taken from a sample of 400 scientists in 4 different fields of physics, computer science, biology and psychology) (more…)

Wikimedia Research Newsletter, September 2013

Wikimedia Research Newsletter
Wikimedia Research Newsletter Logo.png

Vol: 3 • Issue: 9 • September 2013 [contribute] [archives] Syndicate the Wikimedia Research Newsletter feed

Automatic detection of “infiltrating” Wikipedia admins; Wiki, or ‘pedia?

With contributions by: Brian Keegan, Piotr Konieczny, Aaron Halfaker, Jonathan Morgan and Tilman Bayer

Wiki, or ‘pedia? The genre and values of Wikipedia compared with other encyclopedias

Wikipedia and Encyclopaedism: A Genre Analysis of Epistemological Values[1] is a new Masters’ Thesis that analyzes the values that influenced how knowledge is presented on Wikipedia, in comparison with other encyclopedias that have been created throughout history. The author uses genre analysis to compare the epistemological values that are represented in the kind of knowledge that different encyclopedias present and in the way they present that knowledge. The author first conducts a literature review to compare the epistemology of two genres: wikis and encyclopedias. The wiki epistemology is composed of six values: self-identification, collaboration, co-construction, cooperation, trust in the community, and constructionism. By contrast, the values of major current and historical encyclopedias—such as Diderot’s Encyclopedia, Pliny’s Natural History, and the Encyclopædia Britannica—prioritize trust in experts, authority, and consistency.

Despite being based on different, and even somewhat contradictory, value systems, the purpose of Wikipedia and the way it presents knowledge are shown to be similar to other works in the encyclopedia genre. The author analyzes the frequency of common words in section headings of 25 heavily edited English Wikipedia articles that had a corresponding article in Britannica. He compares the evolution of section headings within these Wikipedia articles and multiple editions of Britannica, and shows that the gradual process by which a Wikipedia article becomes more structured through the addition and alteration of headings is similar to the process for Britannica articles, which also tend to become longer and more formally structured over subsequent editions. This thesis presents some interesting parallels between the way articles are developed within Wikipedia and other encyclopedias, despite vastly different timescales and some differing underlying values. It also offers an engaging, in-depth discussion of the concept of genre, the purpose of the encyclopedia genre, and the history of several major historical encyclopedias.

Identifying trending topics of yesteryear

In a paper titled “Temporal Wikipedia search by edits and linkage”[2], the authors develop a method to identify Wikipedia articles associated with topics around a date based on changes the length of the article as well as patterns of the other articles to which it links. This paper expands on prior work in temporal information retrieval and anomaly detection and uses modifications to the HITS and PageRank to return a list of the most relevant documents for a topic on a date. This work has implications for not only using Wikipedia data to identify trending topics, but also to retrospectively identify trending topics. A downloadable Java client allows test searches (for the months of September and October 2011) and the display of the resulting page networks.

Automatic detection of “infiltrating” Wikipedia admins

A paper titled “Manipulation Among the Arbiters of Collective Intelligence: How Wikipedia Administrators Mold Public Opinion”[3], to be presented at next month’s ACM Conference on Information and Knowledge Management (CIKM), makes a rather serious claim: “We find a surprisingly large number of editors who change their behavior and begin focusing more on a particular controversial topic once they are promoted to administrator status.” (more…)

Wikimedia Research Newsletter, August 2013

Wikimedia Research Newsletter
Wikimedia Research Newsletter Logo.png

Vol: 3 • Issue: 8 • August 2013 [contribute] [archives] Syndicate the Wikimedia Research Newsletter feed

WikiSym 2013 retrospective

With contributions by: Piotr Konieczny, Taha Yasseri, Brian Keegan, Dario Taraborelli, Tilman Bayer

WikiSym 2013

WikiSym+OpenSym 2013 group photo (showing around half of the participants at Jumbo Kingdom)

98 registered participants attended the annual WikiSym+OpenSym conference from August 5-7 at Hong Kong’s Cyberport facility. The event preceded the annual global Wikimania conference of the Wikimedia movement in the same city.

WikiSym was started in 2005 as the “International Symposium on Wikis”, and its scope has since been broadened to include the study of other forms of “open collaboration” (such as free software development, or open data), reflected in the adoption of the separate “OpenSym” label. The proceedings, published online at the start of the conference, contain 22 full papers (out of 43 submissions), in addition to short papers, posters, abstracts for research-in-progress presentations, etc. The coverage below reflects the scope of this research report, and complements the pre-conference reviews of some papers in the previous issue.

Episode 96 of the “Wikipedia Weekly” podcast contains some coverage of WikiSym 2013 (from around 10:30-20:00), and some images and media from the event can be found on Wikimedia Commons.

Next year’s WikiSym+OpenSym conference will be held in Berlin, on August 27-29, 2014, and call for papers is already out. Conference chair Dirk Riehle announced that the proceedings will continue to published with ACM, now under its new open access policy.

Full papers

  • Despite policy, only just over half of Wikipedia sources are secondary: “Getting to the Source: Where does Wikipedia Get Its Information From”[1] presents an overall statistics on the sources referred to in English Wikipedia articles to answer this question. The initial seed of source tags is constructed by analysing 30 randomly selected articles, and then all articles in Wikipedia as of May 2012 have been probed to find and classify the references. Some 67 million citations for 3.5 million articles have been found. The classification is performed on a random selection of 500 citations and by two human coders. More than 30% of the citations were classified as primary sources, around 53% as secondary, and around 13% as tertiary. After discussing type, creator, and publisher of the references as well as large scale domain analysis and persistence in time, the paper concludes: “Wikipedia’s content is ultimately driven by the sources from which that content comes. … Although secondary sources are considered by policy to be the most desirable type, we demonstrate that nearly half of all citations are either primary or tertiary sources, with primary sources making up approximately one-third of all citations.”
  • Conflict on Wikipedia as “generative friction”: (more…)

Wikimedia Research Newsletter, July 2013

Wikimedia Research Newsletter
Wikimedia Research Newsletter Logo.png

Vol: 3 • Issue: 7 • July 2013 [contribute] [archives] Syndicate the Wikimedia Research Newsletter feed

Napoleon, Michael Jackson and Srebrenica across cultures, 90% of Wikipedia better than Britannica, WikiSym preview

With contributions by: Taha Yasseri, Han-Teng Liao, Piotr Konieczny, Jonathan Morgan and Tilman Bayer

Multilingual ranking analysis: Napoleon and Michael Jackson as Wikipedia’s “global heroes”

An ArXiv preprint titled “Highlighting entanglement of cultures via ranking of multilingual Wikipedia articles”[1], authored by a group of physicists from France, examines the Wikipedia articles on individuals and their position in the hyperlink network of the articles in each Wikipedia language edition. There are 9 language editions studied. The authors try to locate the most “important” individuals (“heroes”) in each language edition by calculating two different page rank scores: PageRank and CheiRank. After making the lists of individuals with highest ranks in each language edition (with 30 individuals in each list), overlaps between lists are investigated and local and global “heroes” are introduced. The lists of “global heroes” are topped by Napoleon for PageRank, and Michael Jackson for 2DRank. It is shown that both local and global heroes exist and while global heroes gain their central position in the network due to links from multiple other central nodes, local heroes are mostly notable because of the large number of links directly pointing to them. Finally, based on the nationality (language of origin) of the highly ranked individual, a network of languages is constructed and the position of each language in this network is analysed by calculating rank scores. The authors also analyzed the activities of those important individuals, and have found politicians and scientists to be quite often among the most important ones.

Art: Image-sharing relationship between 154 language versions of Wikipedia (from the DMI Summer School 2013)

Wikipedia as Cultural Reference: Srebrenica Massacre, Art and Menstruation


Wikimedia Research Newsletter, June 2013

Wikimedia Research Newsletter
Wikimedia Research Newsletter Logo.png

Vol: 3 • Issue: 6 • June 2013 [contribute] [archives] Syndicate the Wikimedia Research Newsletter feed

Most controversial Wikipedia topics, automatic detection of sockpuppets

With contributions by: Giovanni Luca Ciampaglia, Taha Yasseri and Tilman Bayer.


“The most controversial topics in Wikipedia: a multilingual and geographical analysis”

Map of Conflict in Spanish Wikipedia. Each dot represents a geolocated article. Size and colour of dots are corresponding to the controversy measure according to Sumi et al. (2001)[1]. The map is taken from Yasseri, et al. (2013) [2].

A comparative work by T. Yasseri., A. Spoerri, M. Graham and J. Kertész on controversial topics in different language versions of Wikipedia has recently been posted on the Social Science Research Network (SSRN) online scholarly archive [1]. The paper, which will appear as a chapter of an upcoming book titled “Global Wikipedia: International and cross-cultural issues in online collaboration”, to be published by Scarecrow Press in 2014, and edited by Fichman P., and Hara N., looks at the 100 most controversial topics in 10 language versions of Wikipedia (results including 3 additional languages are reported in the blog of one of the authors), and tries to make sense of the similarities and differences in these lists. Several visualization methods are proposed, based on a flash-based tool developed by the authors, called CrystalView. Controversiality is measured using a scalar metric which takes into account the total volume of pairwise mutual reverts among all contributors to a page. This metric was proposed by Sumi et al. (2011)[2], in a paper reviewed two years ago in this newsletter (“Edit wars and conflict metrics“). Topics related to politics, geographical locations, and religion are reported to be the most controversial across the board, and each language seems to feature specific, local controversies, which the authors further track down by grouping together languages with similar spheres of influences. Furthermore, the presence of latitude/longitude information (geocoordinates) in several of the Wikipedia articles in the sample analyzed in the study let the authors map the top controversial topics to a global world map, showing how each language features both local and global issues as the most heated topics of debate.

In summary, the study shows how valuable information about cross-cultural differences can be extracted from traces of Internet activity, though one obvious question is how the demographics of Wikipedia editors affect the representativeness of the results, an issue which the authors seem to be aware of, and which is probably going to play a role of increasing importance, as the field of cultural studies looks more and more at data generated by peer production communities.

The research has been intensely featured in the media, e.g., Huffington Post, Live Science,, Zeit Online.

Non-virtual sockpuppets created by participants of RecentChangesCamp, as a humorous take on the sockpuppet phenomenon in online communities

Sockpuppet evidence from automated writing style analysis

“A Case Study of Sockpuppet Detection in Wikipedia”[3], presented at a “Workshop on Language in Social Media” this month, describes an automated method to analyze the writing style of users for the purpose of detecting or confirming sockpuppets. The abuse of multiple accounts (also known as “multi-aliasing” or sybil attacks in other contexts) is described as “a prevalent problem in Wikipedia, there were close to 2,700 unique suspected cases reported in 2012.”