Wikimedia blog

News from inside the Wikimedia Foundation.org

Posts Tagged ‘PediaPress’

Update on Offline Wikipedia Projects

The last week was a big week for expanding offline Wikipedia work.

Right now, offline refers to supporting read access to Wikimedia content without an Internet connection.  This increases the reach of the Wikipedia movement by providing more opportunities for people all over the world to access the materials.  Some of the recent initiatives surrounding this project were documented in Wikimedia’s tech blog about a month ago (for more detail regarding the purpose for offline work, see the offline strategy page).

In support of our offline readership work, we’re thrilled to announce the launch of a new feature on Wikipedia developed with our partners from PediaPress.  Last week we enabled ZIM export (the main file format in which offline materials are stored) for the existing PediaPress collections extension on English Wikipedia and numerous other wikis.  This means that individuals can now use the existing PediaPress Create a book tool and download it in a format which can be read offline (via an offline reader, such as Kiwix).  This is important because it opens new avenues for the creation of offline materials, for example, an openZim library hosting different offline “book” options.

Also, the English offline collection Wikipedia 0.8 was made officially available, after much hard work by the Wikipedia 1.0 Editorial Team.  This collection is an iteration in the process of developing a vetted collection of offline articles selected based on their quality and topical importance.  The main constraint with an offline product is the data size restrictions: the entirety of Wikipedia must somehow be condensed so that it fits on a CD, DVD, or USB stick.  Wikipedia 1.0 aims at creating the highest quality and most valuable subset of Wikipedia to meet those size requirements, and v0.8 is a precursor.  Wikipedia 0.8 is a general collection of just under 50K articles, It is available for Mac, PC, or Linux with a Linux or Okawix reader; some mobile phone versions will be available later this month as well.

More updates are sure to come on this offline front: Wikimedians around the world are actively assisting in the development of offline collections as well as distribution.  We are excited to support and document the momentum going forward.

Jessie Wild, Global Development

Encyclopedia of Life curates Wikipedia’s species articles

There are more than 1.9 million animals, plants, and other forms of life on Earth. In May 2007, some of the world’s leading scientists announced the development of the Encyclopedia of Life (EOL) to document them all. Inspired by biologist E. O. Wilson’s TED Wish and supported by more than $25 million in funding, the project aggregates and makes accessible information about species ranging from 19th century journals to modern online databases.

See the page about Solanum lycopersicum, the garden tomato, as an example. Much of the information comes from Solanaceae Source, a specialized source of  names lists, species descriptions, specimen collections and publication lists for the genus Solanum. The Biodiversity Heritage Library provides historical public domain texts about the species from various published journals. Many other specialized and general resources contribute to the overall species page.

A Wikipedia article included in an Encyclopedia of Life species page. The yellow background indicates that no curator has reviewed the content yet. Click the image to enlarge.

You’ll also find a “Wikipedia” entry in the table of contents. It reveals a copy of the Wikipedia article about tomatoes. As of this writing, the article text has a yellow background.

This means that an Encyclopedia of Life curator has not yet reviewed the content for inclusion in EOL. An EOL species page can have one or more curators who select and validate information added to EOL pages. Wikipedia articles, where they exist, are included by default.

Once the article has been validated by a curator, the yellow background is removed. The information for curators and curation standards pages on EOL give some additional background on the curation process, which applies to all content objects in EOL. Specific guidelines have been written for curation of content from Wikipedia and Wikimedia Commons. We’re particularly pleased that EOL encourages its curators to improve Wikipedia directly if errors or omissions are found.

So far, more than 200 Wikipedia articles have been reviewed through this process. Reviewers classify the information as follows:

  • ‘trusted’ – reviewed by curator and not deemed to contain substantially incorrect information
  • ‘untrusted’ – reviewed by curator and deemed to include incorrect or unverifiable information
  • ‘inappropriate’ – reviewed by curator and deemed to not be eligible for inclusion in EOL for other reasons (e.g. too short to add value)

EOL makes the entirety of all review information (who reviewed what when, with what outcome) available through an Atom feed. This means that Wikipedians, and others, can use this information easily in the development of new applications.

The book creator tool makes it possible to order a printed and bound book from any Wikipedia article selection. A custom cover can be chosen. Nautilus photograph by Lee Berger, Creative Commons Attribution/Share-Alike License. (Click to enlarge.)

A proof-of-concept for expert reviews

Magnus Manske is a biochemist and programmer at the Sanger Institute in the United Kingdom. He is also a long-time Wikimedia volunteer, and wrote the first version of the PHP software used by Wikipedia, which later became MediaWiki. As a scientist, Magnus has advocated for the scientific community to use and improve Wikipedia, most recently as co-author of the paper Ten Simple Rules for Editing Wikipedia.

I informed Magnus about the new EOL review information, and suggested that we might want to explore using this information to generated printed books or PDF collections of reviewed articles. The software for exporting Wikipedia articles into books already exists, so it was just a matter of putting two and two together.

So, Magnus used the available data feed to create an automated tool that creates a list of all EOL-reviewed article versions in a form that can be used by Wikipedia’s book tool.

This makes it possible to download a PDF file or order a printed book that only contains EOL-reviewed versions of Wikipedia species articles.

To try it out, visit the page for Magnus’ example book. Click “Download PDF” to generate the (very large) PDF file that contains all the species articles, or “order printed book” to preview or order a printed book from PediaPress (which, as of this month, also offers books in color and hardcover format). If you want to remix or play with the book further, you can click “Open book creator”.

We’re very pleased with this first proof-of-concept, and are grateful to the Encyclopedia of Life team for engaging its community in the curation of Wikipedia articles. Both parties benefit: The Encyclopedia of Life enriches its species pages using the often well-developed Wikipedia content. Wikipedia benefits because EOL’s trusted reviewers add their stamp of approval to Wikipedia articles, which helps Wikipedia readers and editors alike. Where EOL reviewers do not approve, they are encouraged to edit the Wikipedia article.
I asked Bob Corrigan, EOL Product Manager and Acting Deputy Director, to give his take on this project. He writes: “This is definitely a win-win partnership. EOL is focused on providing very deep, structured access to trusted biodiversity information from our network of content partners and curators, and vetted Wikipedia articles can be a terrific gateway to this information. We see a closer relationship with Wikimedia as an important way to expand access to global knowledge about life on Earth.”

Hardcover book made from curated Wikipedia articles. Photo credit: Guillaume Paumier; Nautilus photograph by Lee Berger. Creative Commons Attribution/Share-Alike License 3.0

Example page from the book. Photo credit: Guillaume Paumier; Nautilus photograph by Lee Berger. Creative Commons Attribution/Share-Alike License 3.0

A replicable model

Magnus’ implementation was already created with an eye to future extensibility. If you’re inclined to take a closer technical look, check out Magnus’ “Sifter-Books” script which generates the book data, and can potentially support multiple partner institutions/organizations providing article reviews. As of the time of this writing, Magnus has already added two additional groups who review Wikipedia articles, Rfam and Pfam, databases of RNA and protein families.

Moreover, Magnus has written a small proof-of–concept script which makes the existence of reviews visible on Wikipedia itself. You need to create a user account on the English Wikipedia and follow the installation instructions to use the script. Once installed, a “Reviews” tab will indicate available article reviews.

We look forward to exploring similar partnerships with subject-matter experts in institutions (like universities and libraries), scientific associations, and specialized knowledge communities. If you’re interested in this model, drop me a note (erik at wikimedia dot org).

Erik Moeller
Deputy Director, Wikimedia Foundation
Representative of Wikimedia in the Encyclopedia of Life Institutional Council

Wikipedia hard-cover editions now available

This week our friends over at Pediapress announced that custom-printable books containing Wikipedia articles are now also being offered in attractive hard cover, bound editions – and in color. Previously customers could order softcover editions of books containing a customizable list of Wikipedia articles in any configuration. The new hardcover editions even contain a silk bookmark and stitched bindings.

The Pediapress MediaWiki extension on Wikipedia allows users to collect any number of articles or categories into a single PDF file or OpenOffice text file, which can then be downloaded for off-line viewing or local printing, or through Pediapress’ on-demand printing technologies the document can be turned into a bound book and shipped right to you. To start creating a book, look for the Create a book link under Print/Export on the lefthand Wikipedia menu. Some incredibly unique and inspired Wikipedia books have been created since Pediapress kicked off.

Now is your chance to get your very favorite lists of Wikipedia articles bound in a bookshelf-friendly format. Offline versions of Wikipedia are an important part of the Wikimedia Foundation’s mission to spread free knowledge to everyone on the planet, so we’re happy to see the options and quality of this format expand.

Jay Walsh, Communications

More Ways to Share

Notice a new feature on the left-hand sidebar today? Now, you can “create a book” and take English Wikipedia with you wherever you go, thanks to the good work of our partner, PediaPress. First launched last year for German language Wikipedia, the feature has been extended to a number of languages, now including English. Initially, this feature was available to logged-in users due to scalability issues, but today, everyone using English Wikipedia can assemble any articles of their choosing into a printed book, a PDF file, or an OpenDocument file for word processing.

To create your book, you can start by clicking on the “create a book” button found on the left-hand sidebar under the “print/export” section. From there, you can add any articles you like while browsing through millions of Wikipedia articles. When you’ve completed your selection, you can further customize your book by creating chapters and a title, choosing a photo for the cover and including an author or editor’s name.

Making Wikipedia available to as many people as possible and providing ways for our volunteer community to enjoy the work that they’ve done is central to our mission here at the Foundation. This is an exciting way to share more.

Moka Pantages

Communications

Wiki-to-print feature activated in six more Wikipedia languages

Yesterday we activated the wiki-to-print feature (see our recent blog post) in six additional Wikipedia language editions: French, Polish, Dutch, Portuguese, Spanish, and Simple English. In these language editions, it’s now possible to make collections of Wikipedia articles, share them, download them as PDF and OpenDocument files, or order them as printed books. We specifically activated it in the Simple English edition (which is a version of Wikipedia written in simple terms for children and adults learning English) so that English language users can get a first good feel for the functionality in a Wikipedia environment (it’s been active in English Wikibooks for a while). We’re hoping for a roll-out in additional languages including English very soon; our main concern is scalability of the feature under the massive load of the English Wikipedia.

The feature has been quickly embraced where it has been activated. In the German Wikipedia, since our deployment on January 27, more than 1,000 custom selections have been created and saved. Our technology partner, PediaPress, has been highly responsive to the rapidly accumulating feedback, and many small and larger output issues have been fixed in the last two weeks. For the new deployments, there’s a central feedback page on Meta.

It will be interesting to see how this feature affects writing on Wikipedia. When people start to think about their contributions in the context of a book, having a consistent structure and style is even more important than when viewing separate Wikipedia articles in a browser. Beyond increasing the quality and reach of our content, we also hope that this technology will be valued by our existing volunteer community as a way to turn their contributions into something that can be touched, held, given away — and by new writers as a motivation to participate.

Erik Moeller
Deputy Director, Wikimedia Foundation

(UPDATE 2/27: We’ve enabled it in the English Wikipedia for signed in users and are observing server load and user feedback. If you’re logged in, see the help page for more information on how to use the tool. As always, the PediaPress team is amazingly responsive to issues that people encounter, and we expect continued improvements to the PDF and print quality over the coming weeks and months. If all goes well, we plan to deploy it on all relevant projects for all users in March. Language support in Chinese, Japanese, Korean, Arabic, Hebrew and some other languages still needs to improve and we won’t enable it in languages that the tool can’t handle appropriately yet – code contributions are welcome!)

Wiki-to-print feature now available in the German Wikipedia

A printed book ordered through PediaPress.com

A printed book ordered through PediaPress.com

A few weeks ago, we rolled out a feature to allow users to generate PDF files, OpenDocument word processor files, and on-demand printed books in one of our smaller sister projects, Wikibooks. This same technology has now also been experimentally enabled on the German Wikipedia (thanks to Frank Schulenburg for creating a beautiful help page). Essentially, you can compile a wiki-book from any number of Wikipedia articles, download a PDF or OpenDocument version, or order a printed version from our technology partner, PediaPress. And if you like your book remixes, you can save them for others to use and share.

If you want to take your favorite Wikipedia articles with you on the go, or if you want to have a nicely formatted PDF version, or you want to edit them further in a word processor, this technology is for you. The reason this is being tested on the German Wikipedia, in case you were wondering, is that PediaPress is a German company, and they will be able to respond quickly to feedback directly from the German Wikipedia community. With more than 1.4 billion pageviews a month, the German Wikipedia is also the second most viewed language edition, right after English with 5.2 billion pageviews. We’ve dedicated some hardware to this feature, and testing it on the German Wikipedia will give us a good idea how it behaves under high traffic characteristics.

It should go without saying that all the code developed through this partnership is open source. In other words, if you want to set up your own wiki with PDF support, OpenDocument support, or connectivity to the PediaPress on-demand printing service, you can install the Collection Extension and enable it on your wiki. When we say free, we mean it.

If all goes well, this feature will become available in all Wikimedia projects where it makes sense. This technology has been developed with the generous support of the Commonwealth of Learning and the Open Society Institute.

Erik Moeller
Deputy Director, Wikimedia Foundation

PS: In unrelated tech news, our CTO Brion Vibber has blogged about the AbuseFilter extension, an important tool whose development we’re supporting, which will help Wikipedians to deal more effectively with spam, vandalism, and other destructive user behavior. And if you haven’t seen it, also note his recent post about the Drafts feature that’s being tested, and which should help against accidental loss of edits.