Wikimedia blog

News from the Wikimedia Foundation and about the Wikimedia movement

Wikimedia Research Newsletter

Wikimedia Research Newsletter, June 2014

Wikimedia Research Newsletter
Wikimedia Research Newsletter Logo.png

Vol: 4 • Issue: 6 • June 2014 [contribute] [archives] Syndicate the Wikimedia Research Newsletter feed

Power users and diversity in WikiProjects; the “network of cultures” in multilingual Wikipedia biographies

With contributions by: Taha Yasseri, Maximilian Klein, Piotr Konieczny, Kim Osman, and Tilman Bayer

New book: Global Wikipedia

An edited volume[1] by Pnina Fichman and Noriko Hara from Indiana University, Bloomington was released on May 23, 2014, subtitled “International and Cross-cultural Issues in Online Collaboration”. The book description states that “dozens of books about Wikipedia are available, but they all focus on the English Wikipedia and assume an Anglo-Saxon perspective, while disregarding cultural and language variability or multi-cultural collaborative efforts”. The description claims that this is “the first book to address this gap by focusing attention on the global, multilingual, and multicultural aspects of Wikipedia.” The book contains nine chapters authored by 16 Wikipedia researchers (including a chapter authored by the volume editors). Among the topics covered are international and cross-cultural conflict and collaboration, case studies in the Chinese, Finnish, French, and Greek Wikipedias, and Wikipedia gender gaps in different language sites.

“Interactions of cultures and top people of Wikipedia from ranking of 24 language editions”

Review by Maximilianklein (talk)

The German philosopher Immanuel Kant, born in today’s Russia, is among the small number of cases where the researchers’ method of assigning a historical figure to a national culture based on their birth place fails

This research by Eom et al.[2] is an exploratory data analysis of figures (roughly, “people”) from a mining of date and place of birth and gender in biography articles. Presenting novel ideas based on the infamous Google PageRank algorithm, this paper is a sort of computational history. The methods used are standard – if not a bit dated – compared with more contemporary research using Wikidata. (more…)

Wikimedia Research Newsletter, May 2014

Wikimedia Research Newsletter
Wikimedia Research Newsletter Logo.png

Vol: 4 • Issue: 5 • May 2014 [contribute] [archives] Syndicate the Wikimedia Research Newsletter feed

Overview of research on Wikipedia’s readers; predicting which article you will edit next

With contributions by: Piotr Konieczny, Maximilian Klein and Tilman Bayer

“Wikipedia in the eyes of its beholders: A systematic review of scholarly research on Wikipedia readers and readership”

This paper [1] is another major literature review of the field of Wikipedia studies, brought forward by the authors whose prior work on this topic, titled “The People’s Encyclopedia Under the Gaze of the Sages”[supp 1] was reviewed in this research report in 2012 (“A systematic review of the Wikipedia literature“).

This time the authors focus on a fragment of the larger body of works about Wikipedia, analyzing 99 works published up to June 2011 on the theme of “Wikipedia readership” – in other words focusing on the theme “What do we know about people who read Wikipedia”. The overview focuses less on demographic analysis (since little research has been done in that area), and more on perceptions of Wikipedia by surveyed groups of readers. Their findings include, among other things, a conclusion that “Studies have found that articles generally related to entertainment and sexuality top the list, covering over 40% of visits”, and in more serious topics, it is a common source for health and legal information. They also find that “a very large number of academic in fact have quite positive, if nuanced, perceptions of Wikipedia’s value.” They also observe that the most commonly studied group has been that of students, who offer a convenience sample. The authors finish by identifying a number of contradictory findings and topics in need of further research, and conclude that existing studies have likely overestimated the extent to which Wikipedia’s readers are cautious about the site’s credibility. Finally, the authors offer valuable thoughts in the “implications for the Wikipedia community” section, such suggesting “incorporating one or more of the algorithms for computational estimation of the reliability of Wikipedia articles that have been developed to help address credibility concerns”, similar to the WikiTrust tool.

The authors also published a similar literature review paper summarizing research about the content of Wikipedia, which we hope to cover in the next issue of this research report.

Chinese-language time-zones favor Asian pop and IT topics on Wikipedia

Map of the Chinese-speaking world

A paper[2] presented at the WWW 2014 Companion Conference analyzes the readership patterns of the English and Chinese Wikipedias, with a focus on which types of articles are most popular in the English- or Chinese-language time zones. The authors used all Wikipedia pages which existed under the same name in both languages in the period from 1 June 2012 to 14 October 2012 for their study, coding them through the OpenCalais semantic analysis service with an estimated 2.6% error rate.

The authors find that readers of the English and Chinese Wikipedias from time-zones of high Chinese activity browse different categories of pages. Chinese readers visit English Wikipedia about Asian culture (in particular, Japanese and Korean pop culture) more often, as well as about mobile communications and networking technologies. The authors also find that pages in English are almost ten times as popular as those in Chinese (though their results are not identifying users by nationality directly, rather focusing on time zone analysis).

In this reviewer’s opinion, the study suffers from major methodological problems that are serious enough to cast all the findings in doubt. (more…)

Wikimedia Research Newsletter, April 2014

Wikimedia Research Newsletter
Wikimedia Research Newsletter Logo.png

Vol: 4 • Issue: 4 • April 2014 [contribute] [archives] Syndicate the Wikimedia Research Newsletter feed

Wikipedia predicts flu more accurately than Google; 43% of academics have edited Wikipedia

With contributions by: Piotr Konieczny, Giovanni Luca Ciampaglia and Tilman Bayer

Wikipedia Usage Estimates Prevalence of Influenza-Like Illness

Researchers from Harvard Medical School have tested the possibility of predicting the number of seasonal influenza-like illness (ILI) in the U.S. using data about the traffic to a selected number of Wikipedia entries related to influenza.[1]

They compared their models against the prediction of Google Flu Trends (GFT), one of the earliest and most famous web-based tools for predicting the evolution of seasonal influenza disease patterns. Gold standard for comparison were the public data released by the Center for Disease Control (CDC). The accuracy of GFT is increasingly under question by several authors, culminating in a recent Science commentary piece about the promises and perils of Big Data for prediction of real-world phenomena. The authors start from this observation and submit that Wikipedia searches may be less subject to the biases that affected GFT, and test this hypothesis in the present work. They find that their model is more accurate than GFT, and was able to predict the peak week of the influenza season more often. Another undoubted advantage of Wikipedia compared to GFT, the authors argue, is its public availability, which makes the present model open to public scrutiny.

Survey of academics’ view on Wikipedia and open-access publishing

A study titled “Academic opinions of Wikipedia and open-access publishing”[2] examined academics’ awareness of and attitudes towards Wikipedia and open-access journals for academic publishing through a survey of 120 academics carried out in late 2011 and early 2012. (more…)

Wikimedia Research Newsletter, March 2014

Wikimedia Research Newsletter
Wikimedia Research Newsletter Logo.png

Vol: 4 • Issue: 3 • March 2014 [contribute] [archives] Syndicate the Wikimedia Research Newsletter feed

Wikipedians’ “encyclopedic identity” dominates even in Kosovo debates; analysis of “In the news” discussions; user hierarchy mapped

With contributions by: Federico Leva, Scott Hale, Kim Osman, Jonathan Morgan, Piotr Konieczny, Niklas Laxström, Tilman Bayer and James Heilman

Cross-language study of conflict on Wikipedia

Have you wondered about differences in the articles on Crimea in the Russian, Ukrainian, and English versions of Wikipedia? A newly published article entitled “Lost in Translation: Contexts, Computing, Disputing on Wikipedia”[1] doesn’t address Crimea, but nonetheless offers insight into the editing of contentious articles in multiple language editions through a heavy qualitative examination of Wikipedia articles about Kosovo in the Serbian, Croatian, and English editions.

The authors, Pasko Bilic and Luka Bulian from the University of Zagreb, found the main drivers of conflict and consensus were different group identities in relation to the topic (Kosovo) and to Wikipedia in general. Happily, the authors found the dominant identity among users in all three editions was the “encyclopedic identity,” which closely mirrored the rules and policies of Wikipedia (e.g., NPOV) even if the users didn’t cite such policies explicitly. (This echoes the result of a similar study regarding political identities of US editors, see previous coverage: “Being Wikipedian is more important than the political affiliation“.) Other identities were based largely on language and territorial identity. These identities, however, did not sort cleanly into the different language editions: “language and territory [did] not produce coherent and homogeneous wiki communities in any of the language editions.”

The English Wikipedia was seen by many users as providing greater visibility and thus “seem[ed] to offer a forum for both Pro-Serbian and Pro-Albanian viewpoints making it difficult to negotiate a middle path between all of the existing identities and viewpoints.” The Arbitration Committee, present in the English edition but not in the Serbian or Croatian editions, may have helped prevent even greater conflict. Enforcement of its decisions seemed generally to lead to greater caution in the edition process.

In line with previous work showing some users move between language editions, the authors found a significant amount of coordination work between the language editions. One central focus centered around whether other editions would follow the English edition in breaking the article into two separate articles (Republic of Kosovo and Autonomous Province of Kosovo and Metohija).

The social construction of knowledge on English Wikipedia

review by Kim Osman

Another paper by Bilic, published in New Media & Society[2] looks at the logic behind networked societies and the myth perpetuated by media institutions that there is a center of the social world (as opposed to distributed nodes). The paper goes on to investigate the social processes that contribute to the creation of “mediated centers”, by analyzing the talk pages of English Wikipedia’s In The News (ITN) section.

Undertaking an ethnographic content analysis of ITN talk pages from 2004–2012, Bilic found three issues that were disputed among Wikipedians in their efforts to construct a necessarily temporal section of the encyclopedia. First, that editors differentiate between mass media and Wikipedia as a digital encyclopedia, however what constitutes the border between the two is often contested. Second, there was debate between inclusionists and deletionists regarding the criteria for stories making the ITN section. Third, conflict and discussion occurred regarding English Wikipedia’s relevance to a global audience.

The paper provides a good insight into how editors construct the ITN section and how it is positioned on the “thin line between mass media agenda and digital encyclopedia.” It would be interesting to see further research on the tensions between the Wikipedia policies mentioned in the paper (e.g. WP:NOTNEWS, NPOV) and mainstream media trends in light of other studies about Wikipedia’s approach to breaking news coverage.

User hierarchy map: Building Wikipedia’s Org Chart


Wikimedia Research Newsletter, February 2014

Wikimedia Research Newsletter
Wikimedia Research Newsletter Logo.png

Vol: 4 • Issue: 2 • February 2014 [contribute] [archives] Syndicate the Wikimedia Research Newsletter feed

CSCW ’14 retrospective; the impact of SOPA on deletionism; like-minded editors clustered; Wikipedia stylistic norms as a model for academic writing

With contributions by: David Ludwig, Morten Warncke-Wang, Maximilian Klein, Piotr Konieczny, Giovanni Luca Ciampaglia, Dario Taraborelli and Tilman Bayer

CSCW ’14 retrospective

The 17th ACM Conference on Computer-supported cooperative work and Social Computing (CSCW ’14) took place this month in Baltimore, Maryland.[supp 1] The conference brought together more than 500 researchers and practitioners from industry and academia presenting research on “the design and use of technologies that affect groups, organizations, communities, and networks.” Research on Wikipedia and wiki-based collaboration has been a major focus of CSCW in the past. This year, three papers on Wikipedia were presented:

Unique editors per quarter in conventional and alternative WikiProjects, 2002-2012

Edits per quarter in conventional and alternative WikiProjects, 2002-2012

Slides from Editing beyond articles[1]

The rise of alt.projects in Wikipedia

Jonathan Morgan from the Wikimedia Foundation and collaborators from the University of Washington[1] analyzed the nature of collaboration in alternative WikiProjects, i.e. projects that the authors identify as not following “the conventional pattern of coordinating a loosely defined range of article creation and curation-related activities within a well defined topic area” (examples of such alternative WikiProjects include the Guild of Copy Editors or WikiProject Dispute Resolution). The authors present an analysis of editing activity by members of these projects that are not focused on topic content editing. The paper also reports data on the number of contributors involved in WikiProjects over time: while the number of editors participating in conventional projects decreased by 51% between 2007 and 2012, participation in alternative projects only declined by 13% in the same period and saw an overall 57% increase in the raw number of contributions.

Categorizing barnstars via Mechanical Turk

Paul Andre and collaborators from Carnegie Mellon University presented a study showing how to effectively crowdsource a complex categorization task by assigning it to users with no prior knowledge or domain expertise.[2] The authors selected a corpus of Wikipedia barnstars and showed how different task designs can produce crowdsourced judgments where Mechanical Turk workers accurately match expert categorization. Expert categorization was obtained by recruiting two Wikipedians with substantial editing activity as independent raters.

Understanding donor behavior through email

A team of researchers from Yahoo! Research, the Qatar Computing Research Institute and UC Berkeley analyzed two months of anonymized email logs to understand the demographics, personal interests and donation behavior of individuals responding to different fundraising campaigns.[3] The results include donation email from the Wikimedia Foundation and indicate that among other campaigns, email from a domain had the highest score of messages tagged for spam over total messages read, which the authors attribute to spoofing. The paper also indicates that the Wikimedia fundraiser tends to attract slightly more male than female donors.

Clustering Wikipedia editors by their biases

review by User:Maximilianklein

Building on the streams of rating editors by content persistence and algorithmically finding cliques of editors, Nakamura, Suzuki and Ishikawa propose[4] a sophisticated tweak to find like- and disparate-minded editors, and test it against the Japanese Wikipedia. The method works by finding cliques in a weighted graph between all editors of an article and weighting the edges by the agreement or disagreement between editor. To find the agreement between two editors, they iterate through the full edit history and use the content persistence axioms of interpreting edits that are leaving text unchanged as agreement, and deleting text as disagreement. Addressing that leaving text unchanged is not always a strong indication of agreement, they normalize by each action’s frequency of both the source editor and the target editor. That is, the method accounts for the propensity of an editor to change text, and the propensity of editors to have their text changed.

To verify their method, its results are compared to a simplified weighting scheme, random clustering, and human-clustered results on 7 articles in Japanese Wikipedia. In 6 out of 7 articles, the proposed technique beats simplified weighting. An example they present is their detection of pro- and anti-nuclear editors on the Nuclear Power Plant article. An implication of such detection would be a gadget that colours text of an article depending on which editor group wrote it.

Monthly research showcase launched

Video of the February 2014 Research Showcase

The lifetime of deleted articles by year of creation

The Wikimedia Foundation’s Research & Data team announced its first public showcase, a monthly review of work conducted by researchers at the Foundation. Aaron Halfaker presented a study of trends in newcomer article creation across 10 languages with a focus on the English and German Wikipedias (slides). The study indicates that in wikis where anonymous users can create articles, their articles are less likely to be deleted than articles created by newly registered editors. Oliver Keyes presented an analysis of how readers access Wikipedia on mobile devices and reviewed methods to identify the typical duration of a mobile browsing session (slides). The showcase is hosted at the Wikimedia Foundation every 3rd Wednesday of the month and live streamed on YouTube.[supp 2]

Study of AfD debates: Did the SOPA protests mellow deletionists?


Wikimedia Research Newsletter, January 2014

Wikimedia Research Newsletter
Wikimedia Research Newsletter Logo.png

Vol: 4 • Issue: 1 • January 2014 [contribute] [archives] Syndicate the Wikimedia Research Newsletter feed

Translation assignments, weasel words, and Wikipedia’s content in its later years

With contributions by: Aaron Halfaker, Jonathan Morgan, Piotr Konieczny and Tilman Bayer

Translation students embrace Wikipedia assignments, but find user interface frustrating

An article, “Translating Wikipedia Articles: A Preliminary Report on Authentic Translation Projects in Formal Translator Training”, [1] reports on the author’s experiment with “a promising type of assignment in formal translator training which involves translating and publishing Wikipedia articles”, in three courses with second- and third-year students at the Institute of English Studies, University of Warsaw.

It was “enthusiastically embraced by the trainees … Practically all of the respondents [in a participant survey] concluded that the experience was either ‘positive’ (31 people, 56% of the respondents) or ‘very positive’ (23 people, 42% of the respondents).” And “more than 90% of the respondents (50 people) recommended that the exercise ‘should definitely be kept [in future courses], maybe with some improvements,’ and the remaining 5 people (9%) cautioned that improvements to the format were needed before it was used again. No-one recommended culling the exercise from the syllabus.”

However, the author cautions that Polish–English translations required more instructor feedback and editing than translations from English into Polish (the students’ native language). And “most people found the technological aspects of the assignment frustrating, with most students assessing them as either ‘hard’ (39%) or ‘very hard’ (16%) to complete. (more…)

Wikimedia Research Newsletter, December 2013

Wikimedia Research Newsletter
Wikimedia Research Newsletter Logo.png

Vol: 3 • Issue: 12 • December 2013 [contribute] [archives] Syndicate the Wikimedia Research Newsletter feed

Cross-language editors, election predictions, vandalism experiments

With contributions by: Daniel Mietchen, Maximilian Klein, Piotr Konieczny and Tilman Bayer

Cohort of cross-language Wikipedia editors analyzed

Network graph of the cross-language Wikipedia edits analyzed in the study.

The same network, with the node for the English Wikipedia removed.

Analyzing edits to the then 46 largest Wikipedias between July 9 and August 8, 2013, a study[1] identified a set of about 8,000 contributors (labeled multilingual) with a global user account who have edited more than one of these language versions (excluding Simple English, which was treated separately) in that time frame. It tested five hypotheses about cross-language editing and editors and looked, for instance, at the proportion of contributions that any of these Wikipedias receives from multilingual editors versus contributions from those only editing one language version. The research found that Esperanto and Malay stick out with a high proportion of contributions from multilinguals, and on the other end, that Japanese has few contributions from multilinguals. Overall, in terms of edits per user, multilingual users made more than twice the number of contributions to the study corpus than monolinguals did; they often work on the same topics across language; and in any given language, they are frequently editing articles not edited by monolinguals during the one-month period analyzed here. They thus serve a bridging function between languages.

Two existing write-ups are good starting points to putting the study in context.[supp 1][supp 2] In the long run, it would be interesting to extend the research to (a) cover a longer time span, (b) include contributions from non-registered users, despite technical difficulties, (c) include smaller Wikipedias, and (d) explore the effects of that bridging function in more detail, perhaps in search for ways to support its beneficial effects while minimizing the non-beneficial ones. It would also be interesting to focus on some aspects of those multilingual users (e.g. how do the languages they edit in match with the languages they display on their user pages) or their contributions (e.g. how do their contributions to text, illustrations, references, links, templates, categories or talk page discussions differ across languages, or how contributions from multilinguals differ across topics or between pages with high and low traffic – or to entertain ideas for a multilingual version of editing tools like User:SuggestBot. The paper is one of the first to make use of Wikidata; comparing such cross-lingual Wikipedia contributions with contributions to multi-lingual projects like Wikidata and Commons may also be a fruitful avenue for further research. (See also earlier coverage of a CSCW paper about a similar topic: “Activity of content translators on Wikipedia examined“)

Attempt to use Wikipedia pageviews to predict election results in Iran, Germany and the UK


Wikimedia Research Newsletter, November 2013

Wikimedia Research Newsletter
Wikimedia Research Newsletter Logo.png

Vol: 3 • Issue: 11 • November 2013 [contribute] [archives] Syndicate the Wikimedia Research Newsletter feed

Reciprocity and reputation motivate contributions to Wikipedia; indigenous knowledge and “cultural imperialism”; how PR people see Wikipedia

With contributions by: Piotr Konieczny, Brian Keegan, Nicolas Jullien, Amir E. Aharoni, Henrique Andrade, Tilman Bayer, Daniel Mietchen, Giovanni Luca Ciampaglia, Dario Taraborelli and Aaron Halfaker

What drives people to contribute to Wikipedia? Experiment suggests reciprocity and social image motivations

Wikipedia works on the efforts of unpaid volunteers who choose to donate their time to advance the cause of free knowledge. This phenomenon, as trivial as it may sound to those acquainted with Wikipedia inner workings, has always puzzled economists and social scientists alike, in that standard Economic theory would not predict that such enterprises (and any other community of peer production, for example open source software) would thrive without any form of remuneration. The flip-side of direct remuneration — passion, enthusiasm, belief in free knowledge, in short, intrinsic motivations — could not alone (at least as standard theory goes) convincingly explain such prolonged efforts, given essentially away for free.

Early on the dawn of the Open Source/Libre software movement, some economists noted that successfully contributing to high-profile projects like Linux or Apache may translate in a strong résumé for a software developer, and proposed, as a way to reconcile traditional economic theory with reality, that whereas other forms of extrinsic motivation are available, sustained contribution to a peer production system could happen. But what about Wikipedia? The career incentive is largely absent in the case of the Free Encyclopedia, and is it really the case that intrinsic motivation such as pure altruism cannot be really behind the prolonged efforts of its contributors?

To understand this, a group of researchers at Sciences Po, Harvard Law School, and University of Strasbourg (among others) designed a series of online experiments with the intent of measuring social preferences, and administered them to a group of volunteer Wikipedia editors to understand whether contribution to Wikipedia can be explained by any of the main hypotheses that economists have thus far formulated regarding contribution to public goods.[1][2] The researchers considered three hypotheses, two for intrinsic and one for extrinsic forms of motivation: pure altruism, reciprocity, and social image motives.

In more detail, the researchers asked a number of Wikipedia editors and contributors (all with a registered account) to participate in a series of experimental games specifically designed to measure the extent to which people behave according to one or more of the above social preferences — for example by either free-riding or contributing to the common pool in a public goods game. In addition to this, as a proxy measure for the “social image” hypothesis, they checked whether participants ever received a barnstar on their talk pages and whether they ever chose to display any of these on their user page (coding these individuals as “social signallers”). Finally, they matched each participant with their history of contribution of the participants, and sought to understand which of these measures can explain their edit counts.

The results suggest that reciprocity seems to be the driver of contribution for less experienced editors, whereas reputation (social image) seems to better explain the activity of the more seasoned editors, though, as the authors acknowledge, the goodness of fit of the regression estimates is not great. The study was at the center of a heated debate within the community about the usage of site-wide banners for recruitment purposes. On December 3, one of the authors gave a presentation about the results at Harvard, which is available online as an audio and video recording. According to the Harvard Crimson, he remarked “that the study is still in progress and more data needs to be collected”. The results are so far available in the form of a conference paper and as an unpublished working paper.

Does “cultural imperialism” prevent the incorporation of indigenous knowledge on Wikipedia?


Wikimedia Research Newsletter, October 2013

Wikimedia Research Newsletter
Wikimedia Research Newsletter Logo.png

Vol: 3 • Issue: 10 • October 2013 [contribute] [archives] Syndicate the Wikimedia Research Newsletter feed

User influence on site policies: Wikipedia vs. Facebook vs. YouTube

With contributions by: Han-Teng Liao, Piotr Konieczny, Taha Yasseri, and Tilman Bayer

User influence on site policies is highest on Wikipedia, compared to YouTube and Facebook

Laura Stein, a researcher at the University of Texas at Austin, has concluded[1] that, based on her comparison of user policy documents (including the Terms of Service) of YouTube, Facebook and Wikipedia, Wikipedia offers the highest level of participation power overall. Using Arnstein’s ladder of participation to begin a theoretical discussion on participation and power, Stein carefully proposed a typology of policy and participation (Table 1, p. 359), from the maximal power of “dominant control over site content and governance”, “shared control”, the minimal power of “consultation”, “choice”, and “informing”, to the no power of “deceptive or inadequate information” and “nonparticipation”. She applied this typology across the five policy areas: “permitted content and its use”, “content ownership/copyrights”, “user information/data”, “modifying software” and “user policy formation & consent”) for the three websites, and found that Wikipedia beats other websites in all areas. In the first and last policy areas of “permitted content and its use” and “user policy formation & consent”, Wikipedia gives users the “dominant control” of participation power; for the remaining areas, Wikipedia gives user “shared control over site content and governance”.

In contrast, YouTube and Facebook only provide the minimal power of “informing” in three policy areas: “permitted content and its use”, “content ownership/copyrights”, “modifying software” and provide slightly better minimal power of “choice” for the “user information/data area”. Although Wikipedia is not widely agreed on to be a “social media” website, Stein nevertheless presented a simple typology for evaluating the levels of participation power given to users by platforms. Also, it would be useful to apply this topology in other policy areas including fund dissemination and organizational governance in the near future.

Wikipedia’s coverage of academics

Histogram of h-indexes of scientists from four different disciplines featured in Wikipedia. The solid line shows the average considering all the researchers of the field.

Anna Samoilenko and Taha Yasseri from the Oxford Internet Institute released an arXiv preprint titled: “The distorted mirror of Wikipedia: a quantitative analysis of Wikipedia coverage of academics”.[2] In this study the notability of academics in the English Wikipedia is examined. The ground truth is taken to be the citation records of the scholars under study and the h-index in particular, although the authors admit that the quantity of publications and citations are not the best proxies for evaluating the quality and scientific impact of researchers. Based on the results of the paper, scientists covered in Wikipedia (which are taken from a sample of 400 scientists in 4 different fields of physics, computer science, biology and psychology) (more…)

Wikimedia Research Newsletter, September 2013

Wikimedia Research Newsletter
Wikimedia Research Newsletter Logo.png

Vol: 3 • Issue: 9 • September 2013 [contribute] [archives] Syndicate the Wikimedia Research Newsletter feed

Automatic detection of “infiltrating” Wikipedia admins; Wiki, or ‘pedia?

With contributions by: Brian Keegan, Piotr Konieczny, Aaron Halfaker, Jonathan Morgan and Tilman Bayer

Wiki, or ‘pedia? The genre and values of Wikipedia compared with other encyclopedias

Wikipedia and Encyclopaedism: A Genre Analysis of Epistemological Values[1] is a new Masters’ Thesis that analyzes the values that influenced how knowledge is presented on Wikipedia, in comparison with other encyclopedias that have been created throughout history. The author uses genre analysis to compare the epistemological values that are represented in the kind of knowledge that different encyclopedias present and in the way they present that knowledge. The author first conducts a literature review to compare the epistemology of two genres: wikis and encyclopedias. The wiki epistemology is composed of six values: self-identification, collaboration, co-construction, cooperation, trust in the community, and constructionism. By contrast, the values of major current and historical encyclopedias—such as Diderot’s Encyclopedia, Pliny’s Natural History, and the Encyclopædia Britannica—prioritize trust in experts, authority, and consistency.

Despite being based on different, and even somewhat contradictory, value systems, the purpose of Wikipedia and the way it presents knowledge are shown to be similar to other works in the encyclopedia genre. The author analyzes the frequency of common words in section headings of 25 heavily edited English Wikipedia articles that had a corresponding article in Britannica. He compares the evolution of section headings within these Wikipedia articles and multiple editions of Britannica, and shows that the gradual process by which a Wikipedia article becomes more structured through the addition and alteration of headings is similar to the process for Britannica articles, which also tend to become longer and more formally structured over subsequent editions. This thesis presents some interesting parallels between the way articles are developed within Wikipedia and other encyclopedias, despite vastly different timescales and some differing underlying values. It also offers an engaging, in-depth discussion of the concept of genre, the purpose of the encyclopedia genre, and the history of several major historical encyclopedias.

Identifying trending topics of yesteryear

In a paper titled “Temporal Wikipedia search by edits and linkage”[2], the authors develop a method to identify Wikipedia articles associated with topics around a date based on changes the length of the article as well as patterns of the other articles to which it links. This paper expands on prior work in temporal information retrieval and anomaly detection and uses modifications to the HITS and PageRank to return a list of the most relevant documents for a topic on a date. This work has implications for not only using Wikipedia data to identify trending topics, but also to retrospectively identify trending topics. A downloadable Java client allows test searches (for the months of September and October 2011) and the display of the resulting page networks.

Automatic detection of “infiltrating” Wikipedia admins

A paper titled “Manipulation Among the Arbiters of Collective Intelligence: How Wikipedia Administrators Mold Public Opinion”[3], to be presented at next month’s ACM Conference on Information and Knowledge Management (CIKM), makes a rather serious claim: “We find a surprisingly large number of editors who change their behavior and begin focusing more on a particular controversial topic once they are promoted to administrator status.” (more…)