Wikimedia blog

News from inside the Wikimedia Foundation.org

Posts Tagged ‘research’

Wikipedia at no data cost is appealing to mobile readers

The mobile web is growing at a phenomenal pace. According to research, it will outpace the desktop internet web in 2014, when approximately 1.7 billion users will access the net on their mobile phone, many of them from the Global South, compared to 1.65 billion desktop web users. As part of our mission to provide free knowledge to everyone, we are committed to enhancing our mobile platform, and have made several improvements to the reading user experience. But most importantly, we recently launched a partnership with Orange to provide Wikipedia at no data cost to mobile readers in Africa and the Middle East.

To understand our current Wikipedia mobile users across different geographies and prioritize product features, we conducted a survey of Wikipedia mobile readers. You can read more about its methodology on Meta wiki.

Looking at the data from the survey, there is a strong case to be made for making Wikipedia accessible without data charges on mobile devices.  Over half of Wikipedia mobile readers (52 percent) said that having Wikipedia free for their mobile data plans would increase their Wikipedia usage. Moreover, 28 percent indicated that it would increase their likelihood to buy from that mobile provider.  Another 16 percent said that they would be willing to switch their mobile providers to have free Wikipedia access.

 

Q. If certain mobile phone service providers provided Wikipedia for free on their data plans, how might that affect your actions? Base: 6700 (Those currently pay for a data plan)

Looking globally, we found that Wikipedia readers in the Global South, specifically in Brazil, Latin America and MENA, indicated that they would use Wikipedia more often if no data costs were accrued, and even suggested this as a key motivating factor for switching to or considering alternative service providers.

Q: If certain mobile phone service providers provided Wikipedia for free on their data plans, how might that affect your actions? Base: 6700 (Those currently pay for a data plan)

 

We found high interest in Wikipedia access without data charges despite a majority of readers (54 percent) stating that their mobile data plan is not a significant monthly expense for their household.  But it should be noted that the data is based on current mobile readers, and doesn’t survey those who don’t have current mobile Wikipedia access, some of whom might not have access to the mobile web due to high cost.  Only 14 percent of respondents stated that their data plan was either a significant expense with their household actively managing usage, or too expensive leading to issues of affordability. In addition, about 32 percent stated that it was a significant expense, but that they were not concerned about it.

Q: Which of the following statements best describes how expensive your data plan is relative to other expenses that you have? Base: 6700 (Those currently pay for a data plan)

If you are interested in more data from the mobile survey, please check out the toplines or read our summary report or read key findings.

Mani Pande, Head of Global Development Research

Ayush Khanna, Data Analyst, Global Development

Wikimedia Research Newsletter, January 2012

WRN header.png

Vol: 2 • Issue: 1 • January 2012 [archives] Syndicate the Wikimedia Research Newsletter feed

Language analyses examine power structure and political slant; Wikipedia compared to commercial databases

With contributions by: Tbayer and Piotrus

Contents

Admins influence the language of non-admins

An Arxiv preprint titled “Echoes of power: Language effects and power differences in social interaction”[1] looks at the language used by Wikipedia editors. The authors look at how conversational language can be used to understand power relationships. The research analyzes how much one adapts their language to the language of others involved in a discussion (the process of language coordination). The findings indicate that the more such adoption occurs, the more deferential one is. The authors find that editors on Wikipedia tend to coordinate (language-wise) more with the administrators than with non-administrators. Furthermore, the study suggests that one’s ability to coordinate language has an impact on one’s chances to become an administrator: the admin-candidates who do more language coordination have a higher chance of becoming an administrator than those who don’t change their language. Once a person is elected an administrator, they tend to coordinate less.

A blog post on the website of Technology Review summarized the results using the headline “Algorithm Measures Human Pecking Order” and highlighted the fact that one of the authors is Jon Kleinberg, known as inventor of the HITS algorithm (also known as “hubs and authorities”).

Can Wikipedia replace commercial biography databases?

California State University, East Bay: Could it rely on biographical information from Wikipedia and the web alone?

An article[2] by a librarian and professor at California State University offers a comparison of “biographical content for literary authors writing in English” between Wikipedia, “the web” (i.e. top Google search results) and two commercial databases: the Biography Reference Bank (BRB, now part of EBSCO Industries) and Contemporary Authors Online, motivated by the decision of the author’s institution to cancel its subscription to the latter database (CAO) during a budget crisis in 2008-2009, which among other reasons had been accompanied by “a comment that this information is ‘on the web’”.

The paper starts out with a literature review on the reliability of Wikipedia and then describes how the author compiled a list of 500 authors (mostly from the US and UK) by “examining curricula and textbooks from English literature courses across the USA” and soliciting additional suggestions from peers. These names were then searched on BRB, CAO (as part of the Literature Resource Center), Wikipedia and Google.

(more…)

Readers compare Wikipedia favorably with most major websites

In a previous blog post, we discussed our readers’ perception of article quality. In addition, we asked our readers to compare Wikipedia as a whole to other prominent websites – Facebook, Twitter, New York Times, Google, YouTube, Yahoo and CNN. Of course, there are several key differences between them, but we wanted to understand how Wikipedia stacks up against other high-traffic websites.

Readers from all 16 countries in our sample compared Wikipedia’s interface and ease of navigation to other Internet properties. If we look at the sample as whole, Wikipedia (8.09 on 10) was rated a close second to Google (8.44) on these measures. What makes this even more interesting is Wikipedia’s relationship with the search engine, which we mentioned in an earlier blog post. Although ratings varied across countries quite significantly, in most cases there was little deviation in ratings relative to other websites, with some exceptions.

Interface/look and feel

When asked about the Wikipedia interface, readers scored Wikipedia 7.92 out of 10 on average, just behind Google (8.3). About 46 percent of our readers scored the interface 9+ out of 10, compared to 54 percent for Google. We did not find significant deviations across countries or languages, with one exception: Readers in Egypt (and by extension, Arabic speakers) rated Wikipedia lower than YouTube, Facebook and Yahoo. A desire for better right-to-left support is one plausible explanation for the result.

D8a. How appealing do you find the interface or look of the following sites?

Ease of Navigation

Readers scored Wikipedia 8.27 on this metric, slightly lower than Google (8.59). 53 percent of our readers rated the ease of navigation 9+ out of 10, compared to 63 percent for Google. As above, Arabic/Egyptian readers rated Wikipedia below YouTube, Facebook, and Yahoo.

D8b. How easy do you find it to navigate the following sites?

 

Mani Pande, Head of Global Development Research

Ayush Khanna, Data Analyst, Global Development

We recently conducted an online survey of Wikipedia readers, limited to 250 participants each in 16 countries. This is the seventh in a series of blog posts summarizing our findings. If you are interested, you can find out more about the methodology of the survey here.

Wikimedia Research Newsletter, December 2011

WRN header.png

Vol: 1 • Issue: 6 • December 2011 [archives] Syndicate the Wikimedia Research Newsletter feed

Psychiatrists: Wikipedia better than Britannica; spell-checking Wikipedia; Wikipedians smart but fun; structured biological data

With contributions by: Tbayer, DarTar and Jodi.a.schneider

Contents

Mental health information on Wikipedia more accurate than Britannica and Kaplan & Sadock psychiatry textbook

Wikipedia articles on schizophrenia and other mental health topics were assessed for accuracy, richness of references and readability.

In an article for Psychological Medicine,[1] ten researchers from the University of Melbourne conclude that “the quality of information on depression and schizophrenia on Wikipedia is generally as good as, or better than, that provided by centrally controlled websites, Encyclopaedia Britannica and a psychiatry textbook.”

The study focused on ten mental health topics (e.g. “antidepressants and suicide in young people” or “side-effects of antipsychotics”), five each in the areas of depression and schizophrenia. “Using the topic terms (or synonyms) as key words for the searches or through manual browsing, content relating to these topics was extracted from [Wikipedia and 13 other websites selected for prominent Google results for depression and schizophrenia] and from the most recent edition of Kaplan & Sadock’s Comprehensive Textbook of Psychiatry … and the online version of Encyclopaedia Britannica” by two reviewers. For both depression and schizophrenia, three psychologists with clinical and research expertise in that area evaluated these extracts on accuracy, up-to-dateness, breadth of coverage, referencing and readability, on a scale from 1 to 5 (“e.g. Accuracy: 1 = many errors of fact or unsubstantiated opinions, 3=some errors of fact or unsubstantiated opinions, 5 = all information factually accurate”). As in an earlier study of the quality of health information on Wikipedia (Signpost coverage: “Wikipedia’s cancer coverage is reliable and thorough, but not very readable“), readability was also measured using a Flesch–Kincaid readability test, which is calculated from word and sentence lengths.

For both depression and schizophrenia, Wikipedia scored highest in the accuracy, up-to-dateness, and references categories – surpassing all other resources, including WebMD, NIMH, the Mayo Clinic and Britannica online. In breadth of coverage, it was behind Kaplan & Saddock and others for both areas. And “of the online resources, Wikipedia was rated the least readable [by the human reviewers], although some of its topics received an average rating.” Likewise, the Wikipedia content had relatively high Flesch–Kincaid Grade Level indices (around 16 for schizophrenia and 15 for depression – indicating that a tertiary level of education is necessary to understand the content), similar to that of Britannica but higher than most other resources examined.

The authors note that their “findings largely parallel those of other recent studies of the quality of health information on Wikipedia” (citing eight such studies published between 2007 and 2010):

“Despite variability in the methodologies and conclusions of these studies, the overall implication is that Wikipedia articles on health topics typically contain relatively few factual errors, although they may lack breadth of coverage. … Given the number of patients, would-be patients and concerned others using the internet to search for information on health issues, it seems that Wikipedia is an appropriate recommendation as an information source.

Psychologists gauge impact of Wikipedia’s Rorschach test coverage

(more…)

Launching the Second Annual Wikipedia Editor Survey

On Thursday, December 8th, the Wikimedia Foundation will launch its second semi-annual survey (2011) of Wikipedia editors.  In order to capture editor trends, we are using the same methodology as the April 2011 Editor Survey – editors logged in to Wikipedia will receive a notification, as every editor is eligible to participate. To ensure that all editors have an equal probability of participating in the survey, all logged-in users will see the invitation only once. We’ll do a soft launch on Thursday (all Wikipedias, excluding English) and switch it on for the English Wikipedia next week, to accommodate the Harvard/Sciences Po survey that is launching soon on the English Wikipedia. We urge all Wikipedia editors to give us their feedback and participate in the survey. For more information, you can read the FAQ we’ve posted detailing the survey.

The survey is currently available in various languages in addition to English, including: Chinese (traditional, Hong Kong), Chinese (simplified), Arabic, Catalan, German, Spanish, Japanese, Portuguese, Polish, French, Hebrew, Hungarian, Italian, Russian and Serbo-Croatian. The Foundation will conduct the survey in languages for which translations are available, and for the remainder of Wikipedia language projects the survey will be available in English.  The survey will take about 15 minutes to complete.  Since we are interested in trending the data, about 90% of the questions are the same as in the April 2011 survey. We have added a few new questions based on findings from Wikipedia Summer of Research project and other research work that has been conducted at the Foundation.

The current survey covers the following topics:

  • Demographics
  • Brief section on editors’ technology usage
  • Editing activities and contributions
  • Editor interactions
  • Opinions of editors about chapters, the Foundation and participation in board elections.

We’re looking forward to participation from editors all around the world while the survey is active. Please spread the word, and we would like to thank you in advance for taking the time to contribute your views!

Mani Pande, Head of Global Development Research

From Readers to Contributors

In our recently concluded Annual Plan, we identified increasing the number of active contributors as one of our strategic priorities. As of September 2011, there are 79,890 active Wikipedia contributors (active is defined as those making five or more edits in a month), while we want to increase active editors to approximately 95,000 on all Wikimedia projects in June 2012.

a. Only 6% of our readers have ever made an edit to Wikipedia

b. Most readers are happy to just read, many cite lack of expertise

c. Avid Wikipedia readers, readers with heavy online activity, Twitter users, men, younger readers and online contributors are strong candidates for editors (more…)

Wikimedia Research Newsletter, November 2011

WRN header.png

Vol: 1 • Issue: 5 • November 2011 [archives] Syndicate the Wikimedia Research Newsletter feed

Quantifying quality collaboration patterns, systemic bias, POV pushing, the impact of news events, and editors’ reputation

With contributions by: Tbayer, Hfordsa, DarTar and Romanesco

Contents

Collaboration pattern analysis: Editor experience more important than “many eyes”

One of the motifs indicating article quality: One editor (top) having worked on several related articles (bottom)

A paper titled “Characterizing Wikipedia Pages Using Edit Network Motif Profiles”[1] by three researchers from University College Dublin indicates that the quality of a Wikipedia article can be predicted from characteristics of its “edit network” – a graph derived from the collaboration of Wikipedians in that area. Network motifs are small graphs which occur particularly frequently as sub-graphs of networks of a certain kind, and can be regarded as its building blocks in some sense. (The concept is popular in bioinformatics, where it is applied to gene regulatory networks.) In this paper, the authors use graphs with at most five nodes consisting of users and articles, which are connected by an edge if the user has edited the article – giving 17 possible “Wikipedia network motifs”. (Anonymous users are disregarded.) For a Wikipedia article, the researchers form an “ego network” consisting of that article, articles which link to it (and have been edited by at least one of the users who edited the core article), and the users who edited them. For a sample of around 2000 articles from the History and United States categories, the frequencies of the 17 “Wikipedia network motifs” in those article’s “ego networks” were calculated.

Using machine learning techniques, the researchers are able to discern with some certainty articles of basic quality (defined as having been assessed as Start class by Wikipedians) from those of good quality (defined as Featured or B class), solely based on this set of motif frequencies in the article’s edit network. Looking at the impact of each of the 17 types separately, they found that “all network motifs have some potential to discriminate between good and basic Wikipedia articles” in the sample, but that among the four best predicting motifs, three are “stars with editors at their centre”:

“This is interesting because it shows that many eyes is not really the defining characteristic of quality; instead experience is important – the editors should have worked on many other articles.”

(more…)

Do It Yourself Analytics with Wikipedia

As you probably know, we publish on a regular basis backups of the different Wikimedia projects, containing their complete editing history. As time progresses, these backups grow larger and larger and become increasingly harder to analyze. To help the community, researchers and other interested people, we have developed a number of analytic tools to assist you in analyzing these large datasets. Today, we want to update you about these new tools, what they do and where you can find them. And please remember they are all still in development:

  • Wikihadoop
  • Diffdb
  • WikiPride

Wikihadoop

Wikihadoop makes it possible to use MapReduce jobs using Hadoop on the compressed XML dump files. What this means is that we can embarrassingly easy parallelize the processing of our XML files and this means that we don’t have to wait for days or weeks to finish a job.

We used Wikihadoop to create the diffs for all edits from the English XML dump that was generated in April of this year.

DiffDB

DiffIndexer and DiffSearcher are the two components of the DiffDB. The DiffIndexer takes as raw input the diffs generated by Wikihadoop and creates a Lucene-based index. The DiffSearcher allows you to query the index so you can answer questions such as:

  • Who has added template X in the last month?
  • Who added more than 2000 characters to user talk pages in 2008?

WikiPride

Volume of contributions by registered users on the English Wikipedia until December 2010, colored by account age

Finally, WikiPride allows you to visualize the breakdown of a Wikipedia community by age of account and by the volume of contributed content. You need a Toolserver account to run this, but you will be able to generate cool charts.

If you are having trouble getting Wikihadoop to run, then please contact me at dvanliere at wikimedia dot org and I am happy to point you in the right direction! Let the data crunching begin!

Diederik van Liere, Analytics Team

Most people read Wikipedia on desktops, but mobile and tablets present huge potential

When Wikipedia began in 2001, desktop PCs were the dominant device for web access. However, a lot has changed in the last 10 years with the growth of the mobile web and the introduction of a new class of devices like digital music players, smartphones and tablets. As we are ready to step into 2012, we find that readers are consuming Wikipedia across a gamut of devices – desktops, laptops, smartphones, tablets, gaming devices and so on. In this blog post, we share insights about the devices on which readers consume Wikipedia content.

a. Only 21% of our readers have read Wikipedia on their mobile phone

b. Smartphones are a significant opportunity for Wikipedia growth

c. Most of our readers have a positive opinion of mobile Wikipedia

d. Wikipedia Mobile is the most popular smartphone app

e. Desktops remain most widely used device for reading Wikipedia

f. 21% of US Wikipedia readers have read Wikipedia on a tablet

Readers in US, Russia, Germany and India are the most pleased with Wikipedia Article Quality

In the recently conducted Wikipedia readers study, we asked respondents to rate the quality of Wikipedia articles on several aspects: trustworthiness, comprehensiveness, neutrality, variety, and ease of understanding. Although we already employ the Article Feedback Toolto assess the quality at an article level, we wanted to understand readers’ perception of quality on Wikipedia as a whole.

I. Individual Measures

II. Quality Perception Index

(more…)