Wikimedia blog

News from the Wikimedia Foundation and about the Wikimedia movement

Wikidata

The Wikidata revolution is here: enabling structured data on Wikipedia

The logo of Wikidata

A year after its announcement as the first new Wikimedia project since 2006, Wikidata has now begun to serve the over 280 language versions of Wikipedia as a common source of structured data that can be used in more than 25 million articles of the free encyclopedia.

By providing Wikipedia editors with a central venue for their efforts to collect and vet such data, Wikidata leads to a higher level of consistency and quality in Wikipedia articles across the many language editions of the encyclopedia. Beyond Wikipedia, Wikidata’s universal, machine-readable knowledge database will be freely reusable by anyone, enabling numerous external applications.

“Wikidata is a powerful tool for keeping information in Wikipedia current across all language versions,” said Wikimedia Foundation Executive Director Sue Gardner. “Before Wikidata, Wikipedians needed to manually update hundreds of Wikipedia language versions every time a famous person died or a country’s leader changed. With Wikidata, such new information, entered once, can automatically appear across all Wikipedia language versions. That makes life easier for editors and makes it easier for Wikipedia to stay current.”

The Wikidata entry on Johann Sebastian Bach (as displayed in the “Reasonator” tool), containing among other data the composer’s places of birth and death, family relations, entries in various bibliographic authority control databases, a list of compositions, and public monuments depicting him

The dream of a wiki-based, collaboratively edited repository of structured data that could be reused in Wikipedia infoboxes goes back to at least 2004, when Wikimedian Erik Möller (now the deputy director of the Wikimedia Foundation) posted a detailed proposal for such a project. The following years saw work on related efforts like the Semantic MediaWiki extension, and discussions of how to implement a central data repository for Wikimedia intensified in 2010 and 2011.

The development of Wikidata began in March 2012, led by Wikimedia Deutschland, the German chapter of the Wikimedia movement. Since Wikidata.org went live on 30 October 2012, a growing community of around 3,000 active contributors started building its database of ‘items’ (e.g. things, people or concepts), first by collecting topics that are already the subject of Wikipedia articles in several languages. An item’s central page on Wikidata replaces the complex web of language links that previously connected these articles about the same topic in different Wikipedia versions.

Wikidata’s collection of these items now numbers over 10 million. The community also began to enrich Wikidata’s database with factual statements about these topics (data like the mayor of a city, the ISBN of a book, the languages spoken in a country, etc.). This information has now become available for use on Wikipedia itself, and Wikipedians on many language Wikipedias have already started to add it to articles, or discuss how to make best use of it.

“It is the goal of Wikidata to collect the world’s complex knowledge in a structured manner so that anybody can benefit from it,” said Wikidata project director Denny Vrandečić. “Whether that’s readers of Wikipedia who are able to be up to date about certain facts or engineers who can use this data to create new products that improve the way we access knowledge.”

The next phase of Wikidata will allow for the automatic creation of lists and charts based on the data in Wikidata. Wikimedia Deutschland will continue to support the project with an engineering team that is dedicated to Wikidata’s second year of development and maintenance.

Wikidata is operated by the Wikimedia Foundation and its fact database is published under a Creative Commons 0 public domain dedication. Funding of Wikidata’s initial development was provided by the Allen Institute for Artificial Intelligence [AI]², the Gordon and Betty Moore Foundation and Google, Inc.

Tilman Bayer, Senior Operations Analyst, Wikimedia Foundation

More information available here:

Some of the first applications demonstrating the potential of Wikidata:

  • http://simia.net/treeoflife/ – a (still very incomplete) “tree of life” drawn from relations among biological species in Wikidata’s database
  • “GeneaWiki” generates a graph showing a person’s family relations as recorded in Wikidata, example: Bach family

Translate Wikidata’s user interface and open it to the world

Wikidata is one of the most important and exciting innovations in the world around Wikipedia. To make it accessible to a wide range of users, it needs its user interface to be translated to as many languages as possible, and you can help.

At the first stage, already partly enabled, Wikidata stores “interwiki links”, i.e. page metadata that connect articles about a same topic on different language versions of Wikipedia. Historically, these interwiki links have been duplicated and stored in each of the pages they linked together. With Wikidata, the list of pages about a same topic is centralized.

The next goal of Wikidata is to store not only page metadata like interwiki links, but also common data that is repeated in all languages, such as census data for cities and dates of birth and death of famous authors.

Practically all the projects that are related to Wikipedia are massively multilingual, but Wikidata is especially so: it stores common data with the goal of displaying it efficiently in all languages.

The very useful and famous CIA World Factbook site has tables of data about all countries in the world, but the labels are only written in English. Now imagine a site with such tables, but with the ability to display the labels in any language and not just English: that’s what Wikidata aims to become.

In the near future, the translation of such table labels will be done on the Wikidata website itself. In the meantime, you can help by translating the user interface displayed by the software running Wikidata.

Translation of the Wikidata software is done on translatewiki.net, the same translation platform used to translate Wikipedia’s interface. Wikidata relies on three main components that need translating: Wikibase – Repo, Wikibase – Client and Wikibase – Lib.

Wikipedia made encyclopedic articles open and accessible; Wikidata is about to do the same to statistics and other structured information. To ensure that people speaking your language can benefit from the immense potential of Wikidata, and contribute to its success,  please join us today and help us translate it.

Thank you!

Amir Aharoni
Software Engineer (Internationalization)

Wikidata Summit kicks off in Berlin

The 2-day event is focusing on Wikidata and RENDER, technologies to integrate structured data with Wikipedia and its sister sites.

The Wikidata & RENDER summit, a 2-day technical event focusing on the integration of structured data with Wikipedia, started today in Berlin, Germany, as a prologue to the Wikimedia Hackathon.

The event, organized by Wikimedia Deutschland, consists of workshops, presentations and coding, split into two tracks: one on Wikidata, and the second on RENDER.

The Wikidata project was announced earlier this year; its goal is to build the software infrastructure to support a common source of structured data that can be used in all Wikipedia articles, regardless of their language.

It would work in the same way that images and other multimedia content from Wikimedia Commons can be embedded into any page on a Wikimedia site.

Wikidata is expected to lead to a higher consistency and quality within Wikipedia articles, increased availability of information in the smaller language editions, and decreased maintenance effort for Wikipedia volunteers.

RENDER, the other focus of this summit, is a EU-funded project aimed at developing methods, techniques, software and data sets for scholars and readers (such as Wikipedia users) to understand, describe, process and make use of the diversity of knowledge and information.

About fifty people were invited to attend: they are Wikimedia Deutschland engineers, Wikimedia Foundation engineers, and volunteer MediaWiki developers, with expertise in structured data, MediaWiki and Wikimedia projects.

About 50 engineers and volunteer developers have gathered in Berlin for this prelude to the Wikimedia hackathon.

Sessions will be held today and tomorrow at Station-berlin – Hall 6, the same venue where the Berlin Hackathon 2012 (a.k.a. “Wikimedia Dev days”) will take place, starting tomorrow evening.

Follow and participate

We don’t have live video streaming of the event, but you can follow what’s happening on site through a variety of channels:

  • participants are taking live collaborative notes that will be posted on wiki when sessions are over;
  • they’re also posting information snippets on Twitter and Identi.ca; join the discussion with the #wikidata and #RENDER hashtags;
  • last, you can join us on IRC in the #wikimedia-wikidata and #mediawiki channels on Freenode.

Let us know on IRC or in the comments below if we can do anything else to let you participate remotely.

Guillaume Paumier
Technical communications manager

The Wikipedia data revolution

The second phase of Wikidata will aim to augment the infoboxes which are currently widely used on Wikipedia to display structured data

Wikimedia Deutschland, the German chapter of the Wikimedia movement, and the Wikimedia Foundation are proud to announce Wikidata, a collaboratively edited database of the world’s knowledge and the first new Wikimedia project since 2006.

Wikidata will support the more than 280 language editions of Wikipedia with one common source of structured data that can be used in all articles of the free encyclopedia. Wikidata is expected to lead to a higher consistency and quality within Wikipedia articles, as well as increased availability of information in the smaller language editions. At the same time, Wikidata will decrease the maintenance effort for the 90,000 volunteers editing Wikipedia.

“Wikidata is ground-breaking. It is the largest technical project ever undertaken by one of the 40 international Wikimedia chapters,” said Pavel Richter, CEO of Wikimedia Deutschland. ”Wikimedia Deutschland is thrilled and dedicated to significantly improving the data management of the world’s largest encyclopedia with this project.”

In addition to the Wikimedia projects, the data is expected to be beneficial for numerous external applications, especially for annotating and connecting data in the sciences, in government, and for applications using data in very different ways. The data will be published under a free Creative Commons license.

The initial development of Wikidata is being funded with a donation of 1.3 million Euros, half of which comes from the Allen Institute for Artificial Intelligence [ai]². The Institute supports long-range research activities that have the potential to accelerate progress in artificial intelligence. It was established in 2010 by Microsoft co-founder Paul G. Allen, whose contributions to philanthropy and the advancement of science and technology span more than 25 years.

“Wikidata is a simple and smart idea, and an ingenious next step in the evolution of Wikipedia,” said Dr. Mark Greaves, Vice President of the Allen Institute for Artifical Intelligence. “It will transform the way that encyclopedia data is published, made available, and used by a global audience. Wikidata will build on semantic technology that we have long supported, will accelerate the pace of scientific discovery, and will create an extraordinary new data resource for the world.”

One quarter of Wikidata’s initial funding has been donated by the Gordon and Betty Moore Foundation through its Science Program. ”It is important for science,” said Chris Mentzel, Gordon and Betty Moore Foundation science program officer. “Wikidata will both provide an important data service on top of Wikipedia, and also be an easy-to-use, downloadable software tool for researchers, to help them manage and gain value from the increasing volume and complexity of scientific data.”

Google, Inc. has provided another quarter of Wikidata’s funding. ”Google’s mission is to make the world’s information universally accessible and useful,” said Chris DiBona, Director, Open Source at Google. ”We’re therefore pleased to participate in the Wikidata project which we hope will make significant amounts of structured data available to all.”

Wikidata will be developed in three phases. The first phase is expected to be finished by August 2012. It will centralize links between the different language versions of Wikipedia. In the second phase, editors will be able to add and use data in Wikidata. The results of the second phase are scheduled to be released in December 2012. The third and final phase will allow for the automatic creation of lists and charts based on the data in Wikidata. This will close the initial development process for Wikidata.

The team of eight developers is being led by Dr. Denny Vrandečić. Formerly of the Karlsruhe Institute of Technology, he works with Wikimedia Deutschland and is, together with Dr. Markus Krötzsch, of the University of Oxford, co-founder of the Semantic MediaWiki project, which has pursued the goals of Wikidata for the last few years. The proposal for Wikidata was developed with financial support by the EU project RENDER, which also involves Wikimedia Deutschland as a use-case partner.

Wikimedia Deutschland will perform the initial development, and plans to hand over operation and maintenance of the project to the Wikimedia Foundation by March 2013.

Matthew Roth
Global Communications Manager