Data analytics

What are readers looking for? Wikipedia search data now available

(Update 9/20 17:40 PDT)  It appeared that a small percentage of queries contained information unintentionally inserted by users. For example, some users may have pasted unintended information from their clipboards into the search box, causing the information to be displayed in the datasets. This prompted us to withdraw the files….

READ MORE

Improving the accuracy of the active editors metric

We are making a change to our active editor metric to increase accuracy, by eliminating double-counting and including Wikimedia Commons in the total number of active editors. The active editors metric is a core metric for both the Wikimedia Foundation and the Wikimedia communities and is used to measure the…

READ MORE

US Education Program participants add three times as much quality content as regular new users

Wikipedia Education Program participants from the United States added more than three times as much quality content as regular new users, a quantitative analysis shows. In the Wikipedia Education Program, professors assign their students to edit Wikipedia articles as a grade for class, assisted by volunteer Wikipedia Ambassadors. In fall…

READ MORE

Techies learn, make, win at Foundation’s first San Francisco hackathon

In January, 92 participants gathered in San Francisco to learn about Wikimedia technology and to build things in our first Bay Area hackathon.

READ MORE

Do It Yourself Analytics with Wikipedia

As you probably know, we publish on a regular basis backups of the different Wikimedia projects, containing their complete editing history. As time progresses, these backups grow larger and larger and become increasingly harder to analyze. To help the community, researchers and other interested people, we have developed a number…

READ MORE

Data analytics at Wikimedia Foundation

This post is a follow-on to my previous post “What is Platform Engineering?” .  In this post, I’ll describe the history of our analytics work, talk about how we derive and distribute our statistics, and ask you to join us in building our platform.  Summary:  we’re hiring, and we want to…

READ MORE

Announcing the WikiChallenge Winners

Over the past couple of months, the Wikimedia Foundation, Kaggle and ICDM organized a data competition. We asked data scientists around the world to use Wikipedia editor data and develop an algorithm that predicts the number of future edits, and in particular predicts correctly who will stop editing and who…

READ MORE

Three weeks left in the Wikipedia Participation Challenge

There are still three weeks left in the Wikipedia Participation Challenge (see prior blog post)!  So far, the competition has exceeded our expectations.  As of this morning, 78 teams (167 total individuals) from across the world have participated in the competition, with a total of 735 entries submitted. Half of…

READ MORE

“Rate this Page” is Coming to the English Wikipedia

Since May, the Article Feedback Tool has been available on 100,000 English Wikipedia articles (see blog post). We have now kicked off full deployment to the English Wikipedia at a rate of about 370,000 articles per day and will continue at this rate until deployment is complete. We wanted to…

READ MORE

Data Competition: Announcing the Wikipedia Participation Challenge

We are pleased to announce the launch of the Wikipedia Participation Challenge, a data modeling competition to develop an algorithm that predicts future editing activity on Wikipedia. The competition is hosted by Kaggle, a platform for data modeling and prediction competitions.  The Participation Challenge is open to community members and…

READ MORE