Wikimedia blog

News from the Wikimedia Foundation and about the Wikimedia movement

Posts Tagged ‘performance’

What Lua scripting means for Wikimedia and open source

Yesterday we flipped a switch: editors can now use Lua, an innovative programming language, to generate sections of wiki pages on all our sites. We’d like to talk about what this means for the open source community at large, for Wikimedians, and for our future.

Why we did this

In the old wikitext templating system, this is part of Template:Citation/core. Any Wikipedia article citing a source will cause our CPUs to run through this instructionset. With Lua, we’ll be able to replace this.

When we started digging into the causes of slow pageload times a few years ago, we saw that our CPUs ate a lot of time interpreting templates — useful bits of markup that programmatically told MediaWiki to reuse little bits of text. Templates are everywhere on our sites. Good Wikipedia articles heavily use the citation templates, for instance, and you’ve seen the ubiquitous infoboxes on every biography. In fact, editors can write code to generate substantial portions of wiki pages. Hit “View source” sometime to see how.

But, because we’d never planned for wikitext to become a programming language, these templates were terribly inefficient and hacky — they didn’t even have recursion or loops — and were terrible for performance. When you edit a complex article like Tulsi Gabbard, with scores of citations, it can take up to 30 seconds to parse and display the page. Even as we worked to improve performance via caching, query profiling, new hardware, and other common means, we sometimes had to advise our community to remove functionality from a particular template so pages would render faster.

This wouldn’t do. It was a terrible experience for our users and especially hard for our editors, who had to wait for a multi-second roundtrip after every “how would this page look?” preview.

So our staffers and volunteers worked on Scribunto (from the Latin for “they shall write”), a MediaWiki extension to allow editors to embed Lua scripts instead of wikitext for templating. And volunteers and Foundation staffers have already started identifying pages that are slow to render and converting the most inefficient templates. We have 488,731 templates on English Wikipedia alone right now. The process of turning many of those into Lua scripts is going to affect everyone who reads our sites — and the Scribunto project has already started giving back to the Lua community.

Us and Lua

For instance, our engineer Brad Jorsch wrote mw.ustring.lua, a Unicode module reusable by other Lua developers. This library is good news for people who write templates in non-Latin characters, and for anyone who wants a version of Lua’s standard String library where the methods operate on characters in UTF-8 encoded strings rather than bytes.

And with Scribunto, we empower those frustrated Wikimedians who have been spending years breaking their knuckles making amazing things in wikitext; as they learn how much easier it is to script in Lua, we hope they’ll be able to use those skills in their hobbies, schools, and workplaces. They’ll join forces with the graduates of Codecademy, World of Warcraft, and the other communities that teach anyone to program. New programmers with basic knowledge of computer science who want to do something real with their new skills will find that Lua scripting on Wikimedia sites is a logical next step for them. Our implementation only differs slightly from standard Lua.

And since Scribunto is an extension that any MediaWiki administrator can install, we hope the MediaWiki administrators out there will enjoy using Lua to more easily customize their wikis for their users.

Structured data and new ways to display it

Scribunto lays the foundations for exciting work to come when the Wikidata structured data project comes further online (the Wikidata interface is still in development and being deployed in phases). We know that Lua will be an attractive way to integrate Wikidata information into pages, and we hope a lot of (currently) unstructured data will get structured, helping new applications emerge.

Now that Lua and Wikidata are more mature, we can look forward to enabling more functionality and plugging in more libraries. And as we continue deploying Wikidata, people will make interesting improvements that we currently can’t predict. For instance, right now, each citation is hard to programmatically dissect; the Cite template takes many unstructured parameters (“author1,” “author2,” etc.) We structure these arguments by convention, but the data’s not structured as CS folks would have it, and can’t be queried via APIs, remixed, and so on.

Excerpt of Coordinates module

A screenshot of part of the new Coordinates module, written in Lua by User:Dragons flight. Note that, with Lua, we can actually use proper conditionals.

But in the future, we could have citations stored in Wikidata and then put together onto article pages using Lua, or even assembled into other various reasonable forms (automatically generated bibliographies?) using Lua, and it will be more easy for Zotero users to discover. That’s just one example; on all our sites over the next few years, things will change from the status quo in a user-visible way. The old math and geography templates were inefficient and hard to hack; once rewritten, they’ll run faster and perhaps editors will use them more. We might see galleries, automatic data analyses, better annotated maps, and various other interesting processes and queries embedded in Wikimedia pages.

Open for change

Wikimedians have been writing wikitext templates for years, and doing hard, astounding, unexpected things with them for readers to enjoy. But the steep learning curve drove contributors away. With Lua, a genuine programming language, people now have a deeper and more useful foundation to build upon. And for years, power users on our sites have customized their experiences with JavaScript/CSS Gadgets and user scripts, but those are basically one level above skins preferences; other people won’t stumble upon your hacks in the process of reading an article.

So, now is the first time that the Wikimedia site maintainers have enabled real coding that affects all readers. We’re letting people program Wikipedia unsupervised. Anyone can write a chunk of code to be included in an article that will be seen by millions of people, often without much review. We are taking our “anyone can edit” maxim one big step forward.

If someone doesn’t like the load time of a webpage, they can now actually improve it themselves. Just as we crowdsourced building Wikipedia, now we’re crowdsourcing bits of infrastructure improvement. And this kind of massively multiplayer, crowdsourced performance improvement is uniquely us.

Wikitext templates could do a lot of things, but Lua does them better and faster, and now mere mortals can do it. We’re aiming to help our users learn to program, to empower themselves, and to help each other and help our readers.

We hope you’ll join us.

Sumana Harihareswara, Engineering Community Manager

New Lua templates bring faster, more flexible pages to your wiki

Starting Wednesday, March 13th, you’ll be able to make wiki pages even more useful, no matter what language you speak: we’re adding Lua as a templating language. This will make it easier for you to create and change infoboxes, tables, and other useful MediaWiki templates. We’ve already started to deploy Scribunto (the MediaWiki extension that enables this); it’s on several of the sites, including English Wikipedia, right now.

You’ll find this useful for performing more complex tasks for which templates are too complex or slow common examples include numeric computations, string manipulation and parsing, and decision trees. Even if you don’t write templates, you’ll enjoy seeing pages load faster and with more interesting ways to present information.

Background

The text of English Wikipedia’s string length measurement template, simplified.

MediaWiki developers introduced templates and parser functions years ago to allow end-users of MediaWiki to replicate content easily and build tools using basic logic. Along the way, we found that we were turning wikitext into a limited programming language. Complex templates have caused performance issues and bottlenecks, and it’s difficult for users to write and understand templates. Therefore, the Lua scripting project aims to make it possible for MediaWiki end-users to use a proper scripting language that will be more powerful and efficient than ad-hoc, parser functions-based logic. The example of Lua’s use in World of Warcraft is promising; even novices with no programming experience have been able to make large changes to their graphical experiences by quickly learning some Lua.

Lua on your wiki

As of March 13th, you’ll be able to use Lua on your home wiki (if it’s not already enabled). Lua code can be embedded into wiki templates by employing the {{#invoke:}} parser function provided by the Scribunto MediaWiki extension. The Lua source code is stored in pages called modules (e.g., Module:Bananas). These individual modules are then invoked on template pages. The example: Template:Lua hello world uses the code {{#invoke:Bananas|hello}} to print the text “Hello, world!”. So, if you start seeing edits in the Module namespace, that’s what’s going on.

Getting started

The strlen template as converted to Lua.

Check out the basic “hello, world!” instructions, then look at Brad Jorsch’s short presentation for a basic example of how to convert a wikitext template into a Lua module. After that, try Tim Starling’s tutorial.

To help you preview and test a converted template, try Special:TemplateSandbox on your wiki. With it, you can preview a page using sandboxed versions of templates and modules, allowing for easy testing before you make the sandbox code live.

Where to start? If you use pywikipedia, try parsercountfunction.py by Bináris, which helps you find wikitext templates that currently parse slowly and thus would be worth converting to Lua. Try fulfilling open requests for conversion on English Wikipedia, possibly using Anomie’s Greasemonkey script to help you see the performance gains. On English Wikipedia, some of the templates have already been converted  feel free to reuse them on your wiki.

The Lua hub on mediawiki.org has more information; please add to it. And enjoy your faster, more flexible templates!

Sumana Harihareswara, Engineering Community Manager

Analyzing Mobile Browser Energy Consumption

Recently, technology reporter Jacob Aron wrote a blog post on newscientist.com that talks about how bloated website code drains your smartphone’s battery.

He mentions how Stanford computer scientist Narendran Thiagarajan and colleagues used an Android phone hooked up to a multimeter to measure the energy used in downloading and rendering popular websites. Using their experimental setup they measured the energy needed to render popular web sites as well as the energy needed to render individual web elements such as images, Javascript, and Cascading Style Sheets (CSS). They claim that complex Javascript and CSS can be as expensive to render as images. Moreover, dynamic Javascript requests (in the form of XMLHttpRequest) can greatly increase the cost of rendering the page, since it prevents the page contents from being cached. Finally, they show that on the Android browser, rendering JPEG images is considerably cheaper than other formats, such as GIF and PNG for comparably sized images.

One example that is cited is that simply loading the mobile version of Wikipedia over a 3G connection consumed just over 1 per cent of the phone’s battery, while browsing apple.com, which does not have a mobile version, used 1.4 per cent.
Yet, in the summary of the paper they find that the results from this study are not meaningful except for the initial loading of just a single page resource. It would be interesting to extend these results in a meaningful way, and study the energy signature of an entire browsing session at a site such as Wikipedia, where a user typically moves from page to page. So, during that session, downloaded web elements such as Javascript, CSS and images would mostly be cached locally. Therefore, we really can’t estimate the energy cost of a total session by simply summing the energy usage of pages visited during that session. Measuring an entire typical session may help optimize the power signature of the entire site. Custom CSS that is applicable to every page of a site would easily outweigh the cost of the apparently excessive CSS download for the render of just the first page.
So, one of the ways that we are looking to improve our mobile browser energy consumption is by implementing the MediaWiki ResourceLoader in order to improve the load times for JavaScript and CSS. ResourceLoader is the delivery system in MediaWiki for the optimized loading and managing of modules. Its purpose is to improve MediaWiki’s front-end performance and the experience by making use of strong caching while still allowing near-instant deployment of new code that all clients start using within 5 minutes. Modules are built of JavaScript, CSS and interface messages; it was first released in MediaWiki 1.17.
On Wikimedia wikis, every page view includes hundreds of kilobytes of JavaScript. In many cases, some or all of this code goes unused due to browser support or because users do not make use of the features on the page. In these cases, bandwidth and loading time spent on downloading, parsing and executing JavaScript code are wasted. This is especially true when users visit MediaWiki sites using older browsers, like Internet Explorer 6, where almost all features are unsupported, and parsing and executing JavaScript is extremely slow.
ResourceLoader solves this problem by loading resources on demand and only for browsers that can run them. Although there is too much to summarize in a simple list, the major improvements for client-side performance are gained by:
  • Minifying and concatenating
  • → which reduces the code’s size and parsing/download time
  • JavaScript files, CSS files and interface messages are loaded in a single special formatted “ResourceLoader Implement” server response.
  • Batch loading
  • → which reduces the number of requests made
  • The server response for module loading supports loading multiple modules so that a single response contains multiple ResourceLoader Implements, which in itself contain the minified and concatenated result of multiple javascript/css files.
  • Data URIs embedding
  • → which further reduces the number of requests, response time and bandwidth
  • Optionally images referenced in stylesheets can be embedded as data URIs. Together with the gzippping of the server response, those embedded images, together, function as a “super sprite”.

Patrick Reilly, Senior Software Developer, Mobile