Wikipedia and Wikimedia projects are among the most visited repositories of human knowledge. They are also a unique source of data for understanding how we collaborate to create that knowledge, access it and share it with others.
We invite you to turn this data into useful insights, applications and visualizations, and help our communities and projects thrive. If you have any questions on these releases, feel free to reach out to the Research and Data team via the Analytics mailing list or our #wikimedia-research channel on IRC.
Senior Research Scientist, Research and Data Team Lead
Open Data Sets
Scholarly citations in Wikipedia
A data set of citations to scholarly articles in the English Wikipedia. Includes all citations with DOIs and PubMed identifiers added to Wikipedia articles as of the most recent content dump.
Halfaker, A., Taraborelli, D. (2015). Scholarly article citations in Wikipedia. figshare.
This data set shows how people get to a Wikipedia article and what links they click on next. The most recent release captures 22 million pairs (referer, resource), extracted from a total of 3.2 billion requests to the English Wikipedia. We wrote a step-by-step tutorial and IPython notebook to get you started with this data.
Wulczyn, E., Taraborelli, D. (2015). Wikipedia Clickstream. figshare.
Browser choices of Wikimedia users
This data set provides statistics on the top browsers and platforms used by readers and editors on Wikimedia projects, obtained from the Wikimedia HTTP request logs during a 90-day window. You can also explore this data online via this application.
Keyes, O. (2015). Browser Choices of Wikimedia Readers and Editors. figshare.
Where in the world is Wikipedia?
This data set includes the proportion of traffic to Wikimedia projects originating from a specific country, computed from all HTTP requests collected over the course of 2014. You can also explore this data online via this application.
Keyes, O. (2015). Geographic Distribution of Wikimedia Traffic. figshare.
Wikipedia Article Feedback corpus
The Article Feedback experiment invited readers to participate on Wikipedia by leaving comments on articles, to help editors improve them. This data set includes over 1.5 million messages posted to the English, French and German Wikipedia during the pilot.
Florin, F., Mullie, M., Taraborelli, D. (2014). Wikipedia Article Feedback corpus. figshare.