Data analytics

Image by Stephen Laporte, CC BY-SA 3.0.

15 years of Wikipedia in data visualization

Leave it to designers to show us how dynamic, global, and human Wikipedia really is. Here, we look at fifteen of the best data visualizations making use of Wikipedia data.

Read more
Graph by Joe Sutherland, in the public domain.

Wikipedia’s very active editor numbers have stabilized—delve into the data with us

Very active editor numbers (>100 edits per month) since the English Wikipedia’s launch in 2001. The thick red line symbolises a five-month moving average. Graph by Joe Sutherland, in the public domain. The English Wikipedia’s population of very active editors—registered contributors with more than 100 edits per month—appears to have…

Read more
This Sankey diagram shows how readers reach the English Wikipedia article about London and where they go from there, based on the Wikipedia Clickstream data set. Graph by Ellery Wulczyn and Dario Taraborelli, CC0.

Growing free knowledge through open data

Open data can help us understand how people find and share knowledge online. The Wikimedia Foundation’s Research and Data Team has published 5 open data sets about Wikimedia projects. (…)

Read more

What are readers looking for? Wikipedia search data now available

(Update 9/20 17:40 PDT)  It appeared that a small percentage of queries contained information unintentionally inserted by users. For example, some users may have pasted unintended information from their clipboards into the search box, causing the information to be displayed in the datasets. This prompted us to withdraw the files….

Read more

Improving the accuracy of the active editors metric

We are making a change to our active editor metric to increase accuracy, by eliminating double-counting and including Wikimedia Commons in the total number of active editors. The active editors metric is a core metric for both the Wikimedia Foundation and the Wikimedia communities and is used to measure the…

Read more

US Education Program participants add three times as much quality content as regular new users

Wikipedia Education Program participants from the United States added more than three times as much quality content as regular new users, a quantitative analysis shows. In the Wikipedia Education Program, professors assign their students to edit Wikipedia articles as a grade for class, assisted by volunteer Wikipedia Ambassadors. In fall…

Read more

Techies learn, make, win at Foundation’s first San Francisco hackathon

In January, 92 participants gathered in San Francisco to learn about Wikimedia technology and to build things in our first Bay Area hackathon.

Read more

Do It Yourself Analytics with Wikipedia

As you probably know, we publish on a regular basis backups of the different Wikimedia projects, containing their complete editing history. As time progresses, these backups grow larger and larger and become increasingly harder to analyze. To help the community, researchers and other interested people, we have developed a number…

Read more

Data analytics at Wikimedia Foundation

This post is a follow-on to my previous post “What is Platform Engineering?” .  In this post, I’ll describe the history of our analytics work, talk about how we derive and distribute our statistics, and ask you to join us in building our platform.  Summary:  we’re hiring, and we want to…

Read more

Announcing the WikiChallenge Winners

Over the past couple of months, the Wikimedia Foundation, Kaggle and ICDM organized a data competition. We asked data scientists around the world to use Wikipedia editor data and develop an algorithm that predicts the number of future edits, and in particular predicts correctly who will stop editing and who…

Read more