Wikimedia blog

News from the Wikimedia Foundation and about the Wikimedia movement

Deployments

What Lua scripting means for Wikimedia and open source

Yesterday we flipped a switch: editors can now use Lua, an innovative programming language, to generate sections of wiki pages on all our sites. We’d like to talk about what this means for the open source community at large, for Wikimedians, and for our future.

Why we did this

In the old wikitext templating system, this is part of Template:Citation/core. Any Wikipedia article citing a source will cause our CPUs to run through this instructionset. With Lua, we’ll be able to replace this.

When we started digging into the causes of slow pageload times a few years ago, we saw that our CPUs ate a lot of time interpreting templates — useful bits of markup that programmatically told MediaWiki to reuse little bits of text. Templates are everywhere on our sites. Good Wikipedia articles heavily use the citation templates, for instance, and you’ve seen the ubiquitous infoboxes on every biography. In fact, editors can write code to generate substantial portions of wiki pages. Hit “View source” sometime to see how.

But, because we’d never planned for wikitext to become a programming language, these templates were terribly inefficient and hacky — they didn’t even have recursion or loops — and were terrible for performance. When you edit a complex article like Tulsi Gabbard, with scores of citations, it can take up to 30 seconds to parse and display the page. Even as we worked to improve performance via caching, query profiling, new hardware, and other common means, we sometimes had to advise our community to remove functionality from a particular template so pages would render faster.

This wouldn’t do. It was a terrible experience for our users and especially hard for our editors, who had to wait for a multi-second roundtrip after every “how would this page look?” preview.

So our staffers and volunteers worked on Scribunto (from the Latin for “they shall write”), a MediaWiki extension to allow editors to embed Lua scripts instead of wikitext for templating. And volunteers and Foundation staffers have already started identifying pages that are slow to render and converting the most inefficient templates. We have 488,731 templates on English Wikipedia alone right now. The process of turning many of those into Lua scripts is going to affect everyone who reads our sites — and the Scribunto project has already started giving back to the Lua community.

Us and Lua

For instance, our engineer Brad Jorsch wrote mw.ustring.lua, a Unicode module reusable by other Lua developers. This library is good news for people who write templates in non-Latin characters, and for anyone who wants a version of Lua’s standard String library where the methods operate on characters in UTF-8 encoded strings rather than bytes.

And with Scribunto, we empower those frustrated Wikimedians who have been spending years breaking their knuckles making amazing things in wikitext; as they learn how much easier it is to script in Lua, we hope they’ll be able to use those skills in their hobbies, schools, and workplaces. They’ll join forces with the graduates of Codecademy, World of Warcraft, and the other communities that teach anyone to program. New programmers with basic knowledge of computer science who want to do something real with their new skills will find that Lua scripting on Wikimedia sites is a logical next step for them. Our implementation only differs slightly from standard Lua.

And since Scribunto is an extension that any MediaWiki administrator can install, we hope the MediaWiki administrators out there will enjoy using Lua to more easily customize their wikis for their users.

Structured data and new ways to display it

Scribunto lays the foundations for exciting work to come when the Wikidata structured data project comes further online (the Wikidata interface is still in development and being deployed in phases). We know that Lua will be an attractive way to integrate Wikidata information into pages, and we hope a lot of (currently) unstructured data will get structured, helping new applications emerge.

Now that Lua and Wikidata are more mature, we can look forward to enabling more functionality and plugging in more libraries. And as we continue deploying Wikidata, people will make interesting improvements that we currently can’t predict. For instance, right now, each citation is hard to programmatically dissect; the Cite template takes many unstructured parameters (“author1,” “author2,” etc.) We structure these arguments by convention, but the data’s not structured as CS folks would have it, and can’t be queried via APIs, remixed, and so on.

Excerpt of Coordinates module

A screenshot of part of the new Coordinates module, written in Lua by User:Dragons flight. Note that, with Lua, we can actually use proper conditionals.

But in the future, we could have citations stored in Wikidata and then put together onto article pages using Lua, or even assembled into other various reasonable forms (automatically generated bibliographies?) using Lua, and it will be more easy for Zotero users to discover. That’s just one example; on all our sites over the next few years, things will change from the status quo in a user-visible way. The old math and geography templates were inefficient and hard to hack; once rewritten, they’ll run faster and perhaps editors will use them more. We might see galleries, automatic data analyses, better annotated maps, and various other interesting processes and queries embedded in Wikimedia pages.

Open for change

Wikimedians have been writing wikitext templates for years, and doing hard, astounding, unexpected things with them for readers to enjoy. But the steep learning curve drove contributors away. With Lua, a genuine programming language, people now have a deeper and more useful foundation to build upon. And for years, power users on our sites have customized their experiences with JavaScript/CSS Gadgets and user scripts, but those are basically one level above skins preferences; other people won’t stumble upon your hacks in the process of reading an article.

So, now is the first time that the Wikimedia site maintainers have enabled real coding that affects all readers. We’re letting people program Wikipedia unsupervised. Anyone can write a chunk of code to be included in an article that will be seen by millions of people, often without much review. We are taking our “anyone can edit” maxim one big step forward.

If someone doesn’t like the load time of a webpage, they can now actually improve it themselves. Just as we crowdsourced building Wikipedia, now we’re crowdsourcing bits of infrastructure improvement. And this kind of massively multiplayer, crowdsourced performance improvement is uniquely us.

Wikitext templates could do a lot of things, but Lua does them better and faster, and now mere mortals can do it. We’re aiming to help our users learn to program, to empower themselves, and to help each other and help our readers.

We hope you’ll join us.

Sumana Harihareswara, Engineering Community Manager

New Lua templates bring faster, more flexible pages to your wiki

Starting Wednesday, March 13th, you’ll be able to make wiki pages even more useful, no matter what language you speak: we’re adding Lua as a templating language. This will make it easier for you to create and change infoboxes, tables, and other useful MediaWiki templates. We’ve already started to deploy Scribunto (the MediaWiki extension that enables this); it’s on several of the sites, including English Wikipedia, right now.

You’ll find this useful for performing more complex tasks for which templates are too complex or slow common examples include numeric computations, string manipulation and parsing, and decision trees. Even if you don’t write templates, you’ll enjoy seeing pages load faster and with more interesting ways to present information.

Background

The text of English Wikipedia’s string length measurement template, simplified.

MediaWiki developers introduced templates and parser functions years ago to allow end-users of MediaWiki to replicate content easily and build tools using basic logic. Along the way, we found that we were turning wikitext into a limited programming language. Complex templates have caused performance issues and bottlenecks, and it’s difficult for users to write and understand templates. Therefore, the Lua scripting project aims to make it possible for MediaWiki end-users to use a proper scripting language that will be more powerful and efficient than ad-hoc, parser functions-based logic. The example of Lua’s use in World of Warcraft is promising; even novices with no programming experience have been able to make large changes to their graphical experiences by quickly learning some Lua.

Lua on your wiki

As of March 13th, you’ll be able to use Lua on your home wiki (if it’s not already enabled). Lua code can be embedded into wiki templates by employing the {{#invoke:}} parser function provided by the Scribunto MediaWiki extension. The Lua source code is stored in pages called modules (e.g., Module:Bananas). These individual modules are then invoked on template pages. The example: Template:Lua hello world uses the code {{#invoke:Bananas|hello}} to print the text “Hello, world!”. So, if you start seeing edits in the Module namespace, that’s what’s going on.

Getting started

The strlen template as converted to Lua.

Check out the basic “hello, world!” instructions, then look at Brad Jorsch’s short presentation for a basic example of how to convert a wikitext template into a Lua module. After that, try Tim Starling’s tutorial.

To help you preview and test a converted template, try Special:TemplateSandbox on your wiki. With it, you can preview a page using sandboxed versions of templates and modules, allowing for easy testing before you make the sandbox code live.

Where to start? If you use pywikipedia, try parsercountfunction.py by Bináris, which helps you find wikitext templates that currently parse slowly and thus would be worth converting to Lua. Try fulfilling open requests for conversion on English Wikipedia, possibly using Anomie’s Greasemonkey script to help you see the performance gains. On English Wikipedia, some of the templates have already been converted  feel free to reuse them on your wiki.

The Lua hub on mediawiki.org has more information; please add to it. And enjoy your faster, more flexible templates!

Sumana Harihareswara, Engineering Community Manager

Try out the alpha version of the VisualEditor

Yesterday we launched an alpha, opt-in version of the VisualEditor to the English Wikipedia. This will let editors create and modify real articles visually, using a new system where the articles they edit will look the same as when you read them, and their changes show up as they type enter them — like writing a document in a word processor.

Why launch now?

We want our community of existing editors to get an idea of what the VisualEditor will look like in the “real world” and start to give us feedback about how well it integrates with how they edit right now. We also want to get their thoughts on what aspects should be priorities in the coming months.

The editor is at an early stage and is still missing significant functions, which we will address in the coming months. Because of this, we are mostly looking for feedback from experienced editors at this point, because the alpha VisualEditor is insufficient to really give them a proper experience of editing. We don’t want to promise an easier editing experience to new editors before it is ready.

As we develop improvements, they will be pushed every two weeks to the wikis, allowing you to give us feedback as we go, and tell us what you want us to work on next.

How can I try it out?

The VisualEditor is now available to all logged-in accounts on the English Wikipedia as a new preference, switched off by default. If you go to your “Preferences” screen and click into the “Editing” section, it will have an option labelled “Enable VisualEditor.”

Once enabled, for each article you can edit, you will get a second editor tab labelled “VisualEditor” next to the “Edit” tab. If you click this, after a little pause you will enter the VisualEditor. From here, you can play around, edit and save real articles and get an idea of what it will be like when complete.

At this early stage in our development, we recommend that after saving any edits, you check whether they broke anything. All edits made with the VisualEditor will show up in articles’ history tabs with a “VisualEditor” tag next to them, so you can track what is happening.

We would love your feedback on what we have done so far — whether it’s a problem you discovered, an aspect that you find confusing, what area you think we should work on next, or anything else, please do let us know.

James ForresterProduct Manager, VisualEditor and Parsoid

Introducing Wikipedia’s new HTML5 video player

A new video player has been enabled on Wikipedia and its sister sites, and it comes with the promise of bringing free educational videos to more people, on more devices, in more languages.

The player is the same HTML5 player used in the Kaltura open-source video platform. It has been integrated with MediaWiki (the software that runs Wikimedia sites like Wikipedia) through an extension called TimedMediaHandler. It replaces an older Ogg-only player that has been in use since 2007.

The new player supports closed captions in multiple languages.

Based on HTML5, the new player plays audio and video files on wiki pages. It brings many new features, like advanced support for closed captions and other timed text. By allowing contributors to transcribe videos, the new player is a significant step towards accessibility for hearing-impaired Wikipedia readers. Captions can easily be translated into many languages, thus expanding their potential audience.

TimedMediaHandler also comes with other useful features, like support for the royalty-free WebM video format. Support for WebM makes it possible to seamlessly import videos encoded to that format, such as freely-licensed content from YouTube’s massive library.

Even further behind the scenes, TimedMediaHandler adds support for server-side transcoding, i.e. the ability to convert from one video format to another, in order to deliver the appropriate video stream to the user depending on their bandwidth and the size of the player. For example, support for mobile formats is available, although it is not currently enabled.

The player’s “Share” feature provides a short snippet of code to directly embed videos from Wikimedia Commons in web pages and blog posts, as is the case here.

Sponsored by Kaltura and Google, developers Michael Dale and Jan Gerber are the main architects of the successful launch of the new player. With the support of the Wikimedia Foundation’s engineering team and Kaltura, they have gone through numerous cycles of development, review and testing to finally release the fruits of years of work.

Efforts to better integrate video content to Wikipedia and its sister sites date back to early 2008, when Kaltura and the Wikimedia Foundation announced their first collaborative video experiment. Since then, incremental improvements have been released, but the deployment of TimedMediaHandler is the most significant achievement to date. (more…)

Wikimedia wikis reveal their interwiki map

Starting today, every Wikimedia wiki is revealing its interwiki map to the world, with the deployment of the Interwiki extension.

Interwiki links are links, generated by specific wiki markup, pointing to a website registered in advance. For example, from any MediaWiki-powered website, you can link to the Sunflower article on http://en.wikipedia.org by typing [[wikipedia:Sunflower]]; “wikipedia” is the so-called “interwiki prefix”.

Until now, the list of interwiki prefixes available on a Wikimedia wiki was only visible in manually maintained lists. The Interwiki extension changes that by offering a special page of the wiki, available to all, and listing all available prefixes and their target.

Interwiki links are often intermixed with “interlanguage links“, the small navigation links that show up in the sidebar of most Wikipedia skins, and connect a page with their counterpart in other languages. Indeed, the new Interwiki special page shows both interwiki and interlanguage prefixes.

A screenshot of the page Special:Interwiki

A screenshot of the page Special:Interwiki

(more…)

First of many MediaWiki 1.20 deployments have begun

The logo of MediaWiki (a yellow sunflower surrounded by two pairs of blue square brackets) with gradients symbolizing its coming to age for the next version

Wikimedia sites will gradually be upgraded to version 1.20 of MediaWiki in April 2012.

Wikimedia engineers have finished up the latest version of MediaWiki, the software that powers Wikipedia and its sister sites. We have begun deploying this version, labeled “1.20wmf1,” to all Wikimedia sites in stages. We started on April 10th and will continue until April 25th.

Yes, we only deployed MediaWiki 1.19 a few weeks ago. This new update is part of our effort to get you fixes and improvements much more regularly (a reason we recently switched from Subversion to Git).

We plan to deploy the latest software every two weeks. Rather than calling each version of the deployed software 1.20, 1.21, 1.22, etc every two weeks, we’ll be using a variation of the “1.20″ moniker for the next few months.

We’re decoupling our deployment process (to Wikimedia sites like Wikipedia) and our release process of standalone MediaWiki installer for use on third-party sites. We plan to have MediaWiki 1.20wmf1 and 1.20wmf2 in April, 1.20wmf3 and 1.20wmf4 in May, etc., until we actually release a new MediaWiki 1.20 installer this fall (probably October).

Only after this point will we start referring to deployments as “1.21″ deployments. The cycle will repeat approximately every six months, with Wikimedia deployments every two weeks, and installer releases every six months.

We’ve already tried out the 1.20wmf1 version on a test wiki and on mediawiki.org, and things are looking good. But the schedule may change based on unexpected issues, so you should refer to the MediaWiki 1.20 roadmap for an up-to-date schedule of when your wiki will be affected.

What’s new

This is a fairly small set of changes, compared to the March deployment of MediaWiki 1.19. This is intended to minimize disruption and possible issues, and make it easier to identify the cause of problems, since the possibly problematic code will be much more recent.

The biggest thing you’ll notice is the new diff style (example on mediawiki.org), designed to improve the experience of color-blind and partially sighted visitors.

More polish you’ll notice: There is a new option on Special:Prefixindex and Special:Allpages to hide redirects (addressing bug 30963). New edit emails for watched pages always provide a link to the edit which triggered the mail (fixing bug 32210).  And “Creating” is now given in the page title instead of “Editing” when you are creating a page (fixing bug 22870).

And, of course, developers have improved the software “under the hood” in many ways. A list of all changes is available in the draft release notes.

Snags and glitches?

If, despite our efforts, you encounter issues due to the upgrade, we’ll try and fix them as soon as we can. Get an account and report issues in our bug tracker, which is where we look for reports of problems. And the faster you tell us about problems, the faster we can address them.

Thanks!

Sumana Harihareswara, Volunteer development coordinator
Rob Lanphier, Director of Platform Engineering
Images contained in this blog post are available under CC-BY-SA

Wikipedia Mobile gets a face lift

A growing number of visitors access the mobile site of Wikipedia and it is an area the engineering team is keen to improve. To do this, we are offering a more functional and polished experience adapted for mobile users, who operate in a much more confined world compared to those on the desktop.

This week we pushed several new and updated design changes to our beta. We hope these changes will provide a more professional look and a better experience for you. These include changes to the footer, a cleaner design for revealing and hiding sections, and a revamped full-screen search experience. The mechanism for toggling between desktop and mobile has also moved from the footer to the top navigation menu to the left of search to allow users to switch more effortlessly.

References can now be read in place

Full screen search

In addition to this we have also pushed an experimental feature which makes it easier to refer to references on articles without having to plunge to the bottom of the page. Now clicking on a reference will load an overlay which readers can consult without losing their place in the article.

We are keen to gather feedback to stabilise these additions and make these changes available by default to a much larger audience. In particular and as always, we are interested in any device-specific issues being brought to our attention as well as feedback on the new design. Let us know how you find the experience – good and bad and also the quirks that you discover.

We are also experimenting with animations when revealing references and would appreciate thoughts from the community on which is felt to work best. By default, references are revealed by a fade in/out effect but we would appreciate thoughts on whether a slide animation or no animation would be preferable.

Opt in to our beta and try them out today. We look forward to your feedback which can be provided either here or by your involvement in the design process.

– Jon Robson, Software Developer Mobile

MediaWiki 1.19 deployment to Wikimedia sites: Test it before it breaks

The logo of MediaWiki (a yellow sunflower surrounded by two pairs of blue square brackets) with gradients symbolizing its coming to age for the next version

Wikimedia sites will gradually be upgraded to version 1.19 of MediaWiki over the second half of February 2012.

This article is available in other languages on mediawiki.org.


Wikimedia engineers are putting the final touches to the latest version of MediaWiki, the software that powers Wikipedia and its sister sites. This version, labeled “1.19wmf1″, will be deployed to Wikimedia sites in stages, starting next week.

We’ve recently set up a Beta cluster, replicating a selection of Wikimedia wikis, where Wikimedians have tested the new version and checked that it worked reasonably well with their local wiki’s specific customizations.

Things are looking good, and the current plan is to run the deployment in five stages between February 15th and March 1st, 2012. The schedule may change based on unexpected issues, so you should refer to the MediaWiki 1.19 roadmap for an up-to-date schedule of when your wiki will be affected. (more…)

Scaling media storage at Wikimedia with Swift

Wikipedia is huge. Almost four million articles in English alone — but as they say, a picture is worth a thousand words (actually, it’s usually closer to several million). In terms of raw bits on disk, the largest project is clearly the Wikimedia Commons, the free media repository integrated with all of the Wikimedia projects. In addition, many projects allow their own local media uploads. As a result, across all wikis, Wikimedia stores millions of images, sounds, and other media files.

We’ve been able to manage the load for quite a while by using two servers with lots of local storage — (10 and 30TB), but we’re pushing against that limit and we would like a more fault-tolerant option. So, for the last few months, we have been working on replacing the infrastructure that holds all that data.

Our goal is to have a storage system that will allow us to scale more easily, and accept large collections of media from projects like Wiki Loves Monuments, and the U.S. National Archives’ donation of their collection of photographs by Ansel Adams.

After evaluating a number of options, we chose to pursue OpenStack Swift. Swift is a distributed object storage system with automatic replication, so that if one host has problems the requseted file is retrieved from another server with no interruption of service. Aside from meeting our needs around performance, reliability, and scalability, it is a good fit considering we are also using OpenStack products for Wikimedia Labs.

We have just completed the first milestone along the road to replacing our existing storage systems with Swift: all image thumbnails (scaled images such as a 320px version of a picture) are now stored on Swift. Our current production Swift cluster is made up of 4 back-end storage nodes with 22TB each and 2 front-end proxy nodes that handle user web requests. This new architecture provides us the scalability and reliability we need going forward.

Over the next few months we will build a second Swift cluster in our Virginia data center, then work on migrating all of the original media over to Swift as well. For more detail on the implementation and plan for Swift, you can read up on the documentation on Wikitech, ask questions in the comments below, or come and visit us in #wikimedia-tech on Freenode in IRC.

Ben Hartshorne
Operations
Wikimedia Foundation

Beta cluster allows Wikimedians to test upcoming software on Labs before deployment

Over the last few weeks, we’ve set up a test environment on Wikimedia Labs to replicate our production cluster and test new software before it’s deployed to Wikimedia sites. This will notably allow us to identify issues with the upcoming version of MediaWiki (1.19) before its deployment — but we need your help.

In case you haven’t heard yet, Wikimedia Labs is a platform aimed to make it easier for developers and system administrators to try out improvements to Wikimedia infrastructure, including MediaWiki, and to do analytics and bot work.

In the past, we’ve used prototype wikis to set up testing environments for upcoming releases of MediaWiki or to test new features. This has been helpful, but has suffered from lack of ongoing maintenance.

Over the holidays, I had the idea — with the upcoming 1.19 release, and the Labs servers newly online and available for non-WMF staff — of using Wikimedia Labs to duplicate the production cluster’s configuration in the Labs environment, and work with volunteers to help maintain this environment.

I particularly want to thank the following people for their work on this project:

  • Petr Bena been driving this almost all the way. He started setting this environment, the servers, apache configuration, and has been helping to keep it going on a pretty consistent basis.
  • John du Hart came along after Petr had already begun and lent his experience with setting up wiki farms. With his help, we put together a really great configuration that more closely duplicates what is in production.
  • Oren Bochman has stepped in to get search working on our micro-cluster. On Wikimedia sites, search has always relied on the help of volunteers. While we don’t yet have search working, Oren has helped us document the search back-end — which will help others set up search like we have on the cluster — and has already started to help us build the next generation of search.

Join in now to identify issues before they reach your wiki

We’ve recently opened this up for the real testing, so now is the time to jump in. Please look at the cluster’s SiteMatrix and find wikis to test. Try reading, editing, using your favorite gadgets, and so on as you normally would; treat it as a giant sandbox. If you find a problem, please report it on the problem reports page.

With your help, we can make the upcoming upgrade smoother.

Mark A. Hershberger, Bugmeister