Wikidata-20150622-map-items-enwiki-2880x1440
A map indicating how much you can learn about the world through Wikipedia if English is the only language you speak. There is little to no content available in the many dark areas in the world, especially in Central and South America, Africa, and Asia. Map by Markus Krötzsch, TU Dresden, public domain/CC0.

The French Wikipedia may have more than 20,000 articles on individual asteroids, but if you are one of 27 million people speaking Hausa as a first language, Wikipedia doesn’t yet have an entry on the universe. The English Wikipedia may have more than 5 million articles on topics as diverse as extreme sports or unusual causes of death, but if English is the only language you speak, there is still little to no content to learn from about vast regions of the world—as the map above suggests.

Each day, thousands of volunteer editors are filling knowledge gaps by creating new Wikipedia articles, translating existing ones, and identifying poorly covered topics in any given language. However, discovering and deciding what to edit can be a daunting task, both for editors who are new to Wikipedia and for more-seasoned ones.

Understanding how to improve and accelerate content creation across languages and providing guidance to volunteers is what motivated us in Wikimedia Research to team up with computer science researchers from Stanford University. The team set out to design and test a system that would find, rank, and recommend missing articles to be created across different languages.

We designed personalized recommendations by taking into account editor interests (extracted from their public contribution history), proficiency across languages, and the projected popularity of an article in the target language, if it were to be created. We ran a controlled test of these recommendations on the French-language Wikipedia, by comparing personalized recommendations and non-personalized recommendations against a baseline: our results show that recommendations tripled the rate at which editors create articles, while maintaining the same level of article quality as articles created organically in French Wikipedia. The experimental design, algorithm implementation and results are described in detail in a study recently presented at the 25th World Wide Web Conference (WWW 2016) in Montréal, Canada.[1]

Motivated by the results of the experiment, we were joined by software developers and designers to create a first, prototype version of an article recommendation tool that can recommend articles to be created or translated across any of the languages currently supported in Wikipedia. The tool uses a simplified version of the algorithm, based on the pageview, search, and Wikidata APIs, to identify trending articles in a given source language and missing in a target language. It also allows you to search for recommendations based on the specific topics you are interested in.

Wikipedia_GapFinder
Screenshot by Dario Taraborelli, public domain/CC0.

The tool also comes with an API, currently integrated into the Content Translation tool—a product designed by the Wikimedia Language team to create new articles by translating from one language into another. Specifically, the API powers the Suggestions feature of the tool, providing recommendations to volunteers based on articles they previously translated. Tool developers have also started integrating the API in third-party applications, like Dexbot’s tools. Both the article recommendation tool and its API are open source: anyone can access, use, and build on this technology to design or improve new applications.

Over the coming months, we will be monitoring the tool closely to learn more about how it’s being used by editors and how it can be further improved. If you try out the article recommendation tool, you can provide us with feedback on our discussion page. We are particularly interested in seeing how the tool can be used by larger groups participating in edit-a-thons, meetups, or other outreach events, as a handy solution to generate lists of missing articles. If you would like a demonstration of the tool for your local edit-a-thon, let us know!

Leila Zia, Research Scientist
Dario Taraborelli, Director, Head of Research
Wikimedia Foundation

Notes

[1] Ellery Wulczyn, Robert West, Leila Zia, and Jure Leskovec. 2016. Growing Wikipedia Across Languages via Recommendation. In Proceedings of the 25th International Conference on World Wide Web (WWW ’16). Geneva, Switzerland, 975–985. DOI:10.1145/2872427.2883077 arXiv:1604.03235

This study was nominated for best paper at WWW ‘16. You can read more about it in a Stanford University press release.