Wikimedia blog

News from the Wikimedia Foundation and about the Wikimedia movement

Posts Tagged ‘research’

Do It Yourself Analytics with Wikipedia

As you probably know, we publish on a regular basis backups of the different Wikimedia projects, containing their complete editing history. As time progresses, these backups grow larger and larger and become increasingly harder to analyze. To help the community, researchers and other interested people, we have developed a number of analytic tools to assist you in analyzing these large datasets. Today, we want to update you about these new tools, what they do and where you can find them. And please remember they are all still in development:

  • Wikihadoop
  • Diffdb
  • WikiPride

Wikihadoop

Wikihadoop makes it possible to use MapReduce jobs using Hadoop on the compressed XML dump files. What this means is that we can embarrassingly easy parallelize the processing of our XML files and this means that we don’t have to wait for days or weeks to finish a job.

We used Wikihadoop to create the diffs for all edits from the English XML dump that was generated in April of this year.

DiffDB

DiffIndexer and DiffSearcher are the two components of the DiffDB. The DiffIndexer takes as raw input the diffs generated by Wikihadoop and creates a Lucene-based index. The DiffSearcher allows you to query the index so you can answer questions such as:

  • Who has added template X in the last month?
  • Who added more than 2000 characters to user talk pages in 2008?

WikiPride

Volume of contributions by registered users on the English Wikipedia until December 2010, colored by account age

Finally, WikiPride allows you to visualize the breakdown of a Wikipedia community by age of account and by the volume of contributed content. You need a Toolserver account to run this, but you will be able to generate cool charts.

If you are having trouble getting Wikihadoop to run, then please contact me at dvanliere at wikimedia dot org and I am happy to point you in the right direction! Let the data crunching begin!

Diederik van Liere, Analytics Team

Most people read Wikipedia on desktops, but mobile and tablets present huge potential

When Wikipedia began in 2001, desktop PCs were the dominant device for web access. However, a lot has changed in the last 10 years with the growth of the mobile web and the introduction of a new class of devices like digital music players, smartphones and tablets. As we are ready to step into 2012, we find that readers are consuming Wikipedia across a gamut of devices – desktops, laptops, smartphones, tablets, gaming devices and so on. In this blog post, we share insights about the devices on which readers consume Wikipedia content.

a. Only 21% of our readers have read Wikipedia on their mobile phone

b. Smartphones are a significant opportunity for Wikipedia growth

c. Most of our readers have a positive opinion of mobile Wikipedia

d. Wikipedia Mobile is the most popular smartphone app

e. Desktops remain most widely used device for reading Wikipedia

f. 21% of US Wikipedia readers have read Wikipedia on a tablet

Readers in US, Russia, Germany and India are the most pleased with Wikipedia Article Quality

In the recently conducted Wikipedia readers study, we asked respondents to rate the quality of Wikipedia articles on several aspects: trustworthiness, comprehensiveness, neutrality, variety, and ease of understanding. Although we already employ the Article Feedback Toolto assess the quality at an article level, we wanted to understand readers’ perception of quality on Wikipedia as a whole.

I. Individual Measures

II. Quality Perception Index

(more…)

Search, translation tools on top of agenda for readers

Last week, our blog post about the readers study shared insights about how readers use search on Wikipedia, as well as new search functionalities that they are interested in. This week we share findings from our readers on more search improvements and other features that they would like to see on Wikipedia.

a. Improvements to finding information
b. Sharing, downloading and printing
c. Integration with social networking websites
(more…)

Wikimedia Research Newsletter, October 2011

WRN header.png

Vol: 1 • Issue: 4 • October 2011 [archives]

WikiSym; predicting editor survival; drug information found lacking; RfAs and trust; Wikipedia’s search engine ranking justified

With contributions by: Boghog, Jodi.a.schneider, Drdee, DarTar, Phoebe and Tbayer

Contents

Wiki research beyond the English Wikipedia at WikiSym

Panel discussion at WikiSym 2011

WikiSym 2011, the “7th international symposium on wikis and open collaboration”, took place from October 3-5 at the Microsoft Research Campus in Silicon Valley (Mountain View, California). Although the conference’s scope has broadened to include the study of open online collaborations that are not wiki-based, Wikipedia-related research still took up a large part of the schedule. Several of the conference papers have already been reviewed in the September and August issues of this research overview, and the rest of the proceedings have since become available online.

(more…)

Google drives traffic to Wikipedia, but half of readers look for Wikipedia content

Search and Wikipedia

Search is central to the Wikipedia experience – both as a way of reaching the website as well as discovering content on Wikipedia.  Several questions on the Readers Survey 2011 were aimed at understanding the search experiences across nations, languages, and devices. Here are some of the key insights about search from the study:

Contents

(more…)

Average Wikipedia reader is 36 years old

Every month approximately 400 million unique visitors across the globe read Wikipedia and its sister sites. But very little is known about them. In order to understand our readers and their relationship with Wikipedia, to bring their voice into our product strategy and to enhance their reading user experience, we conducted an online survey of Wikipedia readers across 16 countries (you can find out more about the methodology of the study below).  We’ll be sharing findings from the study in a series of blog posts through the end of this year.  To begin, here is our first blog post that provides demographic information about our readers and their reading habits.

Contents

Average Wikipedia reader is 36 years old

Since its founding over 10 years ago, Wikipedia has emerged as a serious knowledge website, and repository of online information. The data from the survey shows that appeal of Wikipedia is spread across ages. Wikipedia readers are at different life-stages: students, young professionals, older adults and the elderly with an age range of 14-92 years (note: we didn’t survey anyone younger than 14 years for the study).  The research showed that, contrary to the popular perception that most Wikipedia readers are school students who rely on Wikipedia for schoolwork, the average age of a Wikipedia reader is 36.59 years, and the median is 35 years.  As expected, countries with a large youth population (India, Mexico and Egypt) have slightly younger readers, but even in these countries an average reader is either in their late 20s or early 30s. Egypt has the youngest readers at an average age of 28.03 and Japan has the oldest readers at 40.25 years.

Almost half of Wikipedia readers visit the site more than 5 times a month

With 15.8 billion page views in the month of September alone, it is no surprise that Wikipedia readers come back to the site often. On average 65 percent of Wikipedia readers visit the website at least 4 times a month. In fact, almost half of Wikipedia readers (49 percent) are Avid readers—they visit Wikipedia more than five times a month.

Overall, the top 6 countries with the greatest percentage of Avid readers of Wikipedia are: Canada, the United States, the United Kingdom, Germany, Australia and Japan,—these countries consisted of at least 75 percent of Avid readers and they also owned significantly more devices than other countries.  In contrast, the lowest percentages of Avid readers were found in the following countries: South Africa, Egypt, Brazil, India, and Mexico.


Wikipedia has slightly more male readers than female

The Internet started as a male bastion and women have narrowed the Internet gender gap over the years, but even today in some countries there are more male Internet users than female. With reference to Wikipedia, we found that there are more male Wikipedia readers (56 percent) than female (44 percent). While most of the countries have a relatively balanced mix of male and female readers, there were some countries that skewed more male. Australia, Egypt, the United Kingdom and India all had a male ratio higher than 60 percent.

Methodology

The online study was conducted during the summer of 2011. A 15-minute survey was administered to a total sample of n=4000 participants within the following 16 countries (n=250 each):

Australia, Brazil, Canada, Egypt, France, Germany, India, Italy, Japan, Mexico, Poland, Russia, Spain, South Africa, UK, and United States.

Wikipedia readers were divided into two main groups:

  • Those who read Wikipedia articles at least once per month but less than 4 times per month on average were considered to be “Casual” readers.
  • Those who read Wikipedia articles at least 4+ times per month on average were considered to be “Avid” readers.

All countries were weighted against reading frequency and editing frequency using ComScore Media Dashboard 2011 data and actual Editor data from the Wikimedia Foundation to ensure the dataset was representative for each territory and region.

Please stay tuned to hear more about Wikipedia readers in the coming weeks!!

Mani Pande, Head of Global Development Research

Ayush Khanna, Data Analyst

(This is the first in a series of blog posts where we will be sharing insights from the 2011 Readers Survey)

Results from first Wikipedia Ambassador survey

The first generation of Wikipedia Ambassadors participated in a survey when the Public Policy Initiative wrapped up this summer. More than 80 respondents (over half of the 2010-2011 Ambassadors!) provided input about their experiences and how to improve the program. Many Wikimedia Foundation blog followers are probably familiar with the Initiative’s development of the Ambassador Program to open Wikipedia to the academic community. Ambassadors come in two flavors: Campus Ambassadors, who provide a face for Wikipedia on university campuses, and Online Ambassadors, who support the new student editors on wiki as they make their first contributions.

The graphs illustrate the Ambassadors’ role and motivations, based on the survey results.
Ambassador Roles 

 

 

 

 

 

 

 

Ambassador Motivations
While both Campus and Online Ambassadors identified their role as helping newcomers, their motivations diverged. Online Ambassadors were strongly motivated by helping newcomers, and Campus Ambassadors were strongly motivated by increasing Wikipedia credibility and use on university campuses. Both Campus and Online Ambassadors felt responsible first to the students they were working with and second to the Wikipedia community. Ambassadors agreed on the Public Policy Initiative outcomes:

  1. Wikipedia content improved.
  2. Use of Wikipedia as a teaching tool increased.
  3. Ambassadors provided support for college-educated newcomers.
  4. There was an increase of Wikipedia’s credibility among academia.

Through the survey, many Ambassadors shared their most memorable experiences in the program. Some of the highlights include:

  • I showed a student how to check the page view statistics. Hundreds of people had seen his article since he created it. What an immediate impact he had! He was blown away.”
  • For me it was an honor to have a student participant who was also a US Congressman and to help improve his Wikipedia article.”
  • My favorite story is of a non-traditional age student telling me that her son’s 8th grade teacher told the class not to use Wikipedia because it can not be trusted. Our student told her son what she had learned about neutral-voice and verifiability and community scholarship. At the end of the semester her son told her that his middle-school teacher now says it’s okay to use Wikipedia as a place to start looking for information… I sure would like to know what that 8th grader told his teacher about his Mom’s academic Wikipedia experience.”

Check out the pages for the Wikipedia Ambassador Program and Global Education Program to find out more about our program.

Amy Roth
Research Analyst, Public Policy Initiative 

Introducing Wikipedia Editor Satisfaction Index

The Wikimedia Foundation is working on new products and global initiatives to increase participation in our projects, specifically Wikipedia. To help inform the development of this work we’ve been researching the trends and patterns of Wikipedia editors, most recently through the Wikipedia Summer of Research initiative and also with data from the 2011 Wikipedia Editors Survey.

While studying editor participation trends, we have hypothesized that acrimony and disagreement in the editing community could be a leading cause of a decrease in project participation. To test this hypothesis as a segment of our analysis of responses to the Editor Survey (report here), we defined the Wikipedia Editor Satisfaction Index (WESI). The WESI is a metric gauging the overall satisfaction of the editing community and interactions/assessments of fellow editors. We used responses to two questions on the survey: how they described their fellow editors (picking from a set of adjectives), and whether they believed community feedback had helped them personally. These responses were weighted, and then normalized to a 0-10 rating.

The results were encouraging. About 47 percent of editors surveyed scored 10/10. In all, about 77 percent of those surveyed scored 7.5 or higher, indicating that the majority of our editing community is very satisfied with their experience on Wikipedia and has a healthy assessment of fellow editors. This is great news – as Wikipedia continues to focus on improving the editing experience, while also making efforts to foster new participation (especially in the Global South), the community’s support is vital.

Distribution of WESI scores across all surveyed Wikipedia editors

In order to understand what factors determine an Editor’s satisfaction with Wikipedia, we performed a multilinear regression1 on the WESI metric. Some interesting findings:

  1. Help is appreciated: Having others from the community add content or correct grammatical mistakes greatly increases the likelihood of an editor reporting a positive experience.
  2. Peer recognition matters a lot more than any other kind of recognition: Editors highly value the respect and recognition of their peers. Editors who received barnstars or any other form of reward from their peers were much more likely to report a higher score. Interestingly, events like having an article featured or promoted to the front page did not have a very significant effect on editor satisfaction.
  3. Explanations for reverts are key: When an edit is reverted, not explaining why has a strong negative impact on editor satisfaction. Similarly, an explanation actually has a strong positive influence on editor satisfaction.

A comparision of WESI scores reveals that women are, on average, less satisfied than men, though not by much – about 5 percent. Although transsexuals/transgenders (marked below as Others) together account for only 0.5 percent of our sample, it’s important to note that their satisfaction scores are significantly lower.

WESI score comparision by Gender

The Editor Survey Report highlights some more findings, but the emerging theme is simple: be nice to each other, and help out where you can!

As we work towards establishing the WESI metric as a standard for understanding the community’s experiences on Wikipedia, we’ll continue to share more findings (and implications) of the Wikipedia Editor Satisfaction Index.

Mani Pande, Head of Global Development Research

Ayush Khanna, Global Development Intern

(This is the eleventh in series of blog posts where we previously shared insights from the April 2011 Editors Survey.)

1: http://en.wikipedia.org/wiki/Regression_analysis

Results from the Japanese Editor Survey

We have blogged recently about the results from our semi-annual editor survey. Although the survey was conducted in 22 languages, it didn’t include Japanese, due to the March earthquake and ensuing Tsunami in Japan.  It is with great pleasure that we would like to share toplines from a survey of editors conducted recently on the Japanese Wikipedia. We fielded it for about a week in the end of July, and got 208 complete responses.  Like the semi-annual editor survey, the Japanese editor survey was available only to registered users of the Japanese Wikipedia and every editor saw the invitation to participate in the survey only once. The latter was done to control for bias towards more active editors.

The topline data covers all the questions from the survey: demographics, interactions with community members, technology ecology, and editing behaviors.  We are hoping that the Japanese community (as well as others) will check the data, conduct some analysis and provide feedback to us.

Please also check out the graphs for some key demographics of Japanese editors. The results from the editor survey in the Japanese Wikipedia show that the Japanese editing community is similar to others demographically: predominantly male, highly educated and slightly older than what we imagined our community to be before we conducted the survey.

 

 

Mani Pande, Head of Global Development Research

(This is the ninth in series of blog posts where we previously shared insights from the April 2011 Editors Survey.)