Wikimedia blog

News from the Wikimedia Foundation and about the Wikimedia movement

Posts Tagged ‘Creative Commons’

The First ever Creative Commons event in Telugu: Ten Telugu Books Re-released Under CC

Event flyer, User:రహ్మానుద్దీన్, CC-BY-SA 3.0

Telugu is one of the 22 scheduled languages of the Republic of India (Bhārat Gaṇarājya) and is the official language of the Indian states of Andhra Pradesh, Telangana and the Union Territory district of Yanam. In India alone Telugu is spoken by 100 million people and is estimated to have 180 million speakers around the world. The government of India declared Telugu a Classical language in 2008.

Telugu Wikipedia has been in existence for more than 10 years and has 57,000 articles. Telugu Wikisource is one of the sister projects that has more than 9,400 pages. Several Telugu books are being typed and proofread using Proofread extension. Since Telugu is one of the complex Indic scripts, computing in Telugu came much later. Many books that were published (or are being published) are not in Unicode. Telugu Wikisource has now emerged as the largest searchable online book repository in Telugu. Telugu Wikisourcerers, despite being a small community, did a great job of digitizing many prominent Telugu literary works. Attempts have been made to convince contemporary writers to re-release their books in CC-BY-SA 3.0 license. Such an effort was made a year ago by bringing in a translation of the Quran in Telugu. Recently, 10 Telugu books by a single author were re-released under the Creative Commons license (CC-BY-SA 3.0) on June 22, 2014 at The Golden Threshold, an off-campus annex of the University of HyderabadCIS-A2K played an instrumental role in getting this content donated. This is one of the first instances in an Indian languages where a single author re-released such a large collection of books under the CC license. These books are being uploaded on Telugu Wikisource using Unicode converters.


Creative Commons releases version 4.0: Congratulations!

Yamashita Yohei – CC on Orange

Today, Creative Commons announced the release of version 4.0 of their license suite. CC drafted version 4.0 in order to create a more international, adoptable, and long-lasting license. As a long-time supporter and user of Creative Commons, Wikimedia congratulates CC on the release. We think it gives the open culture movement improved legal tools, and hope that it will increase sharing and remixing.

This new license is the result of a two year process that began at CC’s Global Summit in 2011 and was first announced online late that year. Because the Wikimedia projects use version 3.0 of the license, many Wikimedians, including the Wikimedia Foundation’s legal team and various Wikimedia chapters, have participated in the drafting process. Many other creators of open cultural works have also participated, so we expect a healthy uptake of the new license in the months and years to come, building on the widespread adoption of previous versions of the license.

From the perspective of the Wikimedia projects, the biggest changes to version 4.0 include:

  • Easier to understand: 4.0 has twenty percent fewer words than 3.0, and is more clear and readable in a variety of ways. Perhaps most importantly, the language of the license is better organized, making it more clear what conditions apply when reproducing and sharing licensed content.
  • Clearer attribution: Attribution requirements in the license are easier to understand. It also makes explicit that Wikimedia’s existing attribution practices are compatible with the license.
  • Global operation: A key goal for 4.0 was to help the licenses “operate globally, ensuring they are robust, enforceable and easily adopted worldwide.” This includes drafting the licenses so that they can be translated without requiring legal changes for every jurisdiction. This will hopefully allow CC BY-SA 4.0 to be one license, available and enforceable in many languages, rather than a family of similar licenses with changes for different languages and jurisdictions.
  • Database rights: Perhaps the biggest substantive change in CC 4.0 is the extension of the license to create obligations related to the so-called “database rights” created by the European Union and some other jurisdictions. Understanding and evaluating the impact of these clauses will be extremely important, given the increasing importance of databases to Wikimedia through projects like Wikidata.

The creation and publication of the new licenses does not change the default licensing of the Wikimedia projects, nor what licenses are acceptable for contributions to the Wikimedia projects. Those changes will not happen until the Wikimedia community has evaluated and publicly discussed the new license, and (in the case of a change of default license) the Board of Trustees has formally approved a change to the Terms of Use. The Wikimedia Foundation’s legal team looks forward to participating in the community’s discussion about these issues and the licenses.

Luis Villa
Deputy General Counsel, Wikimedia Foundation

What are readers looking for? Wikipedia search data now available

(Update 9/20 17:40 PDT)  It appeared that a small percentage of queries contained information unintentionally inserted by users. For example, some users may have pasted unintended information from their clipboards into the search box, causing the information to be displayed in the datasets. This prompted us to withdraw the files.

We are looking into the feasibility of publishing search logs at an aggregated level, but, until further notice, we do not plan on publishing this data in the near future.

Diederik van Liere, Product Manager Analytics

I am very happy to announce the availability of anonymous search log files for Wikipedia and its sister projects, as of today. Collecting data about search queries is important for at least three reasons:

  1. it provides valuable feedback to our editor community, who can use it to detect topics of interest that are currently insufficiently covered.
  2. we can improve our search index by benchmarking improvements against real queries.
  3. we give outside researchers the opportunity to discover gems in the data.

Peter Youngmeister (Ops team) and Andrew Otto (Analytics team) have worked diligently over the past few weeks to start collecting search queries. Every day from today, we will publish the search queries for the previous day at: (we expect to have a 3 month rolling window of search data available).

Each line in the log files is tab separated and it contains the following fields:

  1. Server hostname
  2. Timestamp (UTC)
  3. Wikimedia project
  4. URL encoded search query
  5. Total number of results
  6. Lucene score of best match
  7. Interwiki result
  8. Namespace (coded as integer)
  9. Namespace (human-readable)
  10. Title of best matching article

The log files contain queries for all Wikimedia projects and all languages and are unsampled and anonymous. You can download a sample file. We collect data from both from the search box on a wiki page after the visitor submits the query, and from queries submitted from Special:Search pages. The search log data does not contain queries from the autocomplete search functionality, this generates too much data.

Anonymous means that there is nothing in the data that allows you to map a query to an individual user: there are no IP addresses, no editor names, and not even anonymous tokens in the dataset. We also discard queries that contain email addresses, credit card numbers and social security numbers.

It’s our hope that people will use this data to build innovative applications that highlight topics that Wikipedia is currently not covering, improve our Lucene parser or uncover other hidden gems within the data. We know that most people use external search engines to search Wikipedia because our own search functionality does not always give the same accuracy, and the new data could help to give it a little bit of much-needed TLC. If you’ve got search chops then have a look at our Lucene external contractor position.

We are making this data available under a CC0 license: this means that you can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. But we do appreciate it if you cite us when you use this data source for your research, experimentation or product development.

Finally, please consider joining the Analytics mailing list or #wikimedia-analytics on Freenode (IRC). And of course you’re also very welcome to send me email directly.

Diederik van Liere, Product Manager Analytics

(Update 9/19 20:20 PDT) We’ve temporarily taken down this data to make additional improvements to the anonymization protocol related to the search queries.

Wikimedia community approves license migration

Today we announced some fantastic news. The proposal to see Wikimedia’s content adopt a new dual license system has been voted on and approved by the Wikimedia community.  With the full approval of our Board of Trustees, this now means that the Wikimedia Foundation will proceed with the implementation of a CC-BY-SA/GFDL dual license system on all of our project’s content. The new dual license will begin to come into effect in June.

A Q&A about the announcement has been posted on the Foundation wiki.  You can also find considerably more information, discussion, and details about the license change and the work of the license update committee on their meta page.

A huge thanks to the committee, to the folks at Creative Commons (who have also blogged on the topic), to Richard Stallman and the Free Software Foundation, and to thousands of Wikimedia volunteers from around the world who both authored the content and voted to help make the proposal a reality.

Jay Walsh, Head of Communications

Vote on Wikimedia licensing update underway

One of the core principles under which Wikipedia and all other Wikimedia Foundation projects operate is that the knowledge contributed by hundreds of thousands of volunteers shouldn’t be locked into our servers. People should be able to re-use and re-purpose it in countless useful ways, commercial or non-commercial, to ensure that our work reaches the largest possible number of people. And from online mirrors to DVD editions to printed books to mobile versions, this basic principle has allowed knowledge to flow freely across all media.

When authors don’t make an explicit licensing choice, this isn’t possible: as an author, copyright law gives you maximal “protection”, unless you grant usage rights to others. Because the Wikimedia projects are an open collaboration, this grant of rights is requested from all contributors: When you make an edit to Wikipedia or most of our other projects, you’re asked to release it under a license that gives others, essentially, the right to use it for any purpose, as long as they provide credit to the authors and make any improvements freely available.

There are standard licensing documents that enumerate the rights and obligations of re-users. When Wikipedia started in January 2001, the project chose the GNU Free Documentation License (GFDL) developed for freely usable software documentation. The idea of giving information other than software freely away in this fashion was still relatively novel at the time,  and so it made sense to adopt a license that had been developed by the free software community, which at the time could already look back on a long tradition of sharing cultural works freely.

However, because it was developed specifically for (typically printed) documentation, the GFDL contains many passages that aren’t relevant to an online work like Wikipedia, and it also contains obligations that, when taken literally, are quite onerous. For example, it requires that the full text of the license accompany every copy of the work, and it also requires that the section entitled “history” be included with each copy. (For Wikipedia, a massively edited work, this history of changes is often much larger than the work itself.) While Wikipedia has developed a long practice of interpreting this language to facilitate easy re-use, the literal text of the license has baffled many re-users and confused them about what they can and cannot do.

In 2002, a newly formed non-profit organization called Creative Commons released a set of standardized licensing agreements to flexibly grant rights to re-users (the right to make copies, the right to commercial use, the right to distribute modified versions of a document, etc.). These licensing agreements have found rapid adoption by a growing community of authors. For example, the popular photo-sharing site Flickr integrated the option to choose one of the Creative Commons licenses directly into its uploading interface, and thousands of users have granted more permissive rights to re-users than standard copyright would give. Last month, Flickr celebrated that more than 100 million photos had been uploaded under one of the CC licenses.

Importantly, some of the CC licenses are significantly more restrictive than what Wikimedia permits: unlike Wikimedia, they restrict commercial re-use, or limit the creation of derivatives. (In the case of a photo, that would include embedding the photo into a video sequence, for example.) One license, however, is very similar to the GNU Free Documentation License in its fundamental spirit and intent: the Creative Commons Attribution/Share-Alike License.

Unlike the GFDL, CC-BY-SA allows simply referencing the license text instead of including it with each copy, and it does not require copying an entire history of changes with each document. And, it’s not a license written for software documentation, but for any kind of work. Moreover, it’s been specifically adapted to many international jurisdictions, and there are official translations in many languages. A more detailed comparison is available.

Because many people consider it more suitable for works other than software documentation than the GFDL, it’s also been widely adopted. Projects like WikiEducator, Citizendium, the Encyclopedia of Earth, the Encyclopedia of Life, and many others use CC-BY-SA as a content license. While GFDL and CC-BY-SA are very similar, text under one license cannot be integrated into text under another. This incompatibility barrier has presented a growing problem: As other communities have started to share knowledge freely, Wikimedia has lacked interoperability to be able to take from them, and give to them.

As early as 2004, first discussions began about harmonizing the Wikimedia license. Last year, the Free Software Foundation released a new version of the GFDL, 1.3, which specifically allows massively collaborative websites like the Wikimedia projects to also license content under CC-BY-SA. This option was developed by the Free Software Foundation in answer to a request by the Wikimedia Foundation. The request included a commitment by the Wikimedia Foundation to consult its community of volunteers before actually implementing any change.

After months of open discussion and development of the specific licensing terms under which Wikimedia content will be available, the Wikimedia community is now encouraged to vote on a proposal for updating the Wikimedia Foundation licensing terms on projects which currently use the GFDL. Rather than eliminating the GFDL entirely, the proposal will retain it where possible, while also making content available under CC-BY-SA and allowing it to be imported. If the proposal is implemented, licensing terms on all projects in all languages will be standardized where the GFDL is currently use. This standardization will also create  clear and understandable terms and conditions for re-users who want to remix information from our projects.

In order to vote, users who have made more than 25 edits prior to March 15, 2009 on any Wikimedia project can visit a special page which will transfer them to a third party server (the page is linked from a notice on top of all pages for logged in users).  The server is administered by Software in the Public Interest, Inc. (SPI) to guarantee the integrity of the vote.  The vote will be tallied by a licensing committee made of Wikimedia volunteers. It will be concluded by May 3, 2009. After the vote result is published, the Board of Trustees of the Wikimedia Foundation will consult regarding the outcome of the vote and next steps.

The Wikimedia Foundation Board of Trustees has published a clear position statement: “The Board has evaluated possible licensing options for Wikimedia material, and believes that this proposal is the best available path towards achieving our collective goal to collect, develop and disseminate educational material, and make it available to people everywhere, free of charge, in perpetuity.”

Erik Moeller
Deputy Director, Wikimedia Foundation

Other coverage: Creative Commons weblog<