Wikimedia blog

News from the Wikimedia Foundation and about the Wikimedia movement

Posts by Sumana Harihareswara

Wikimedia engineering July 2013 report

Major news in July include:

  • Giving more editors an easy-to-use editing interface (the VisualEditor) on several Wikipedias
  • Improving language support on our sites via summer interns’ projects and easier configuration options, and asking for help translating the VisualEditor interface
  • Enabling users to edit our sites from mobile devices, like phones and tablets, and announcing a future user experience bootcamp focusing on mobile editing
  • Finishing our transition from keeping source code in Subversion to storing it in Git
  • Launching a Wikipedia Zero partnership with Aircel, giving mobile subscribers in India the potential to access Wikipedia at no data cost
  • Updating the Wikimedia movement on how we intend to protect our users’ privacy with HTTPS
  • Signing a contract with longtime MediaWiki contributors to manage MediaWiki releases for the open source community
  • Explaining how we find and gather software problems and deliver the fixes to users

Note: We’re also providing a shorter, simpler and translatable version of this report that does not assume specialized technical knowledge.

(more…)

Volunteers and staffers teach, learn, create at Amsterdam hackathon

149 participants from 31 countries came to Amsterdam in late May to teach each other and improve Wikimedia technology.

developers near the sticky-note wall

Developers work near sticky-notes representing topics and ideas at the Amsterdam hackathon in May 2013.

Technologists taught and attended sessions on how to write and run a bot, use the new Lua templating language, how to move from Toolserver to the new Wikimedia Labs, design, Wikidata, security, and the basics of Git and Gerrit. Check out the workshops page for slides, tutorials, and other reference material; videorecordings of sessions are due for uploading to Wikimedia Commons soon.

Wikimedia Netherlands, Wikimedia Germany, and the Wikimedia Foundation subsidized travel and accommodation for dozens of participants, enabling the highest participation in this event’s history. As one subsidized participant wrote, “One of the wonderful things about the Wikipedia world is the support given to the volunteers from the different chapters and the parent Wikimedia Foundation to promote community growth and building awesome stuff that the whole world can use….It’s such surprises that makes one love contributing to open source.” Organizers also put together a social events program that included a boat cruise of Amsterdam’s canals.

Participants are still listing what they accomplished or learned during the event, but here’s a sample:

  • The Wikimaps project aims to present historical maps on Wikimedia sites, and to work together with OpenStreetMap Historic “to find a common way to model historical geodata” (more details). Maps aficionados discussed the project and made plans in Amsterdam. One volunteer, Arun Ganesh, wrote a prototype wiki atlas: an interactive SVG file that comes with automatic labelling (details).
  • Moritz Schubotz, a volunteer, worked on improving search and math functionality in MediaWiki.
  • The Foundation testing and quality assurance team improved test coverage and the test environment, and taught other participants how to do QA for Wikimedia.
  • Pau Giner, a designer at the Foundation, wrote code to use an SVG for the collapsible section arrow in MediaWiki’s Vector skin. This will make the image less fuzzy-looking.
  • two technologists at Amsterdam hackathon

    A WMF staffer holds a microphone to amplify a volunteer’s voice during the closing demo session at the Amsterdam hackathon.

    User:Ruud Koot wrote a Wikivoyage listing editor that will make it easier to improve the specific parts of a travel suggestion without having to load the whole page.

  • Several volunteers worked on the account creation tool and process for English Wikipedia, to help the ACC team deal with prospective editors who have not been able to create an account via the web interface. The improved tool (code) streamlines the workflow, helping volunteers do their work faster.
  • A group of staffers and volunteers interested in statistical data improved the User Metrics API‘s reliability and security. Another wrote a proof-of-concept MediaWiki extension enabling editors to embed Limn graphs in wiki pages via wikitext.

So far, 90 participants have submitted the post-event survey and results are largely positive, with (of course) several suggestions for improvements in the future. For instance, next year, organizers should help trainers prepare more, and help participants with common interests find and work with each other more easily.  We don’t yet know where or when next year’s developer meeting will be, but it’ll happen; subscribe to the low-traffic wikitech-announce mailing list to hear when it’s settled.

You may also wish to read the Wikipedia Signpost report on the event.

Thanks are due to staffers at the Wikimedia Foundation, Wikimedia Netherlands, and Wikimedia Germany who made the event possible, and to volunteers who ran the event, especially lead Maarten Dammers.  And thanks to all the participants who gave up their weekend to make our sites better.

Sumana Harihareswara
Engineering Community Manager, Wikimedia Foundation

What Lua scripting means for Wikimedia and open source

Yesterday we flipped a switch: editors can now use Lua, an innovative programming language, to generate sections of wiki pages on all our sites. We’d like to talk about what this means for the open source community at large, for Wikimedians, and for our future.

Why we did this

In the old wikitext templating system, this is part of Template:Citation/core. Any Wikipedia article citing a source will cause our CPUs to run through this instructionset. With Lua, we’ll be able to replace this.

When we started digging into the causes of slow pageload times a few years ago, we saw that our CPUs ate a lot of time interpreting templates — useful bits of markup that programmatically told MediaWiki to reuse little bits of text. Templates are everywhere on our sites. Good Wikipedia articles heavily use the citation templates, for instance, and you’ve seen the ubiquitous infoboxes on every biography. In fact, editors can write code to generate substantial portions of wiki pages. Hit “View source” sometime to see how.

But, because we’d never planned for wikitext to become a programming language, these templates were terribly inefficient and hacky — they didn’t even have recursion or loops — and were terrible for performance. When you edit a complex article like Tulsi Gabbard, with scores of citations, it can take up to 30 seconds to parse and display the page. Even as we worked to improve performance via caching, query profiling, new hardware, and other common means, we sometimes had to advise our community to remove functionality from a particular template so pages would render faster.

This wouldn’t do. It was a terrible experience for our users and especially hard for our editors, who had to wait for a multi-second roundtrip after every “how would this page look?” preview.

So our staffers and volunteers worked on Scribunto (from the Latin for “they shall write”), a MediaWiki extension to allow editors to embed Lua scripts instead of wikitext for templating. And volunteers and Foundation staffers have already started identifying pages that are slow to render and converting the most inefficient templates. We have 488,731 templates on English Wikipedia alone right now. The process of turning many of those into Lua scripts is going to affect everyone who reads our sites — and the Scribunto project has already started giving back to the Lua community.

Us and Lua

For instance, our engineer Brad Jorsch wrote mw.ustring.lua, a Unicode module reusable by other Lua developers. This library is good news for people who write templates in non-Latin characters, and for anyone who wants a version of Lua’s standard String library where the methods operate on characters in UTF-8 encoded strings rather than bytes.

And with Scribunto, we empower those frustrated Wikimedians who have been spending years breaking their knuckles making amazing things in wikitext; as they learn how much easier it is to script in Lua, we hope they’ll be able to use those skills in their hobbies, schools, and workplaces. They’ll join forces with the graduates of Codecademy, World of Warcraft, and the other communities that teach anyone to program. New programmers with basic knowledge of computer science who want to do something real with their new skills will find that Lua scripting on Wikimedia sites is a logical next step for them. Our implementation only differs slightly from standard Lua.

And since Scribunto is an extension that any MediaWiki administrator can install, we hope the MediaWiki administrators out there will enjoy using Lua to more easily customize their wikis for their users.

Structured data and new ways to display it

Scribunto lays the foundations for exciting work to come when the Wikidata structured data project comes further online (the Wikidata interface is still in development and being deployed in phases). We know that Lua will be an attractive way to integrate Wikidata information into pages, and we hope a lot of (currently) unstructured data will get structured, helping new applications emerge.

Now that Lua and Wikidata are more mature, we can look forward to enabling more functionality and plugging in more libraries. And as we continue deploying Wikidata, people will make interesting improvements that we currently can’t predict. For instance, right now, each citation is hard to programmatically dissect; the Cite template takes many unstructured parameters (“author1,” “author2,” etc.) We structure these arguments by convention, but the data’s not structured as CS folks would have it, and can’t be queried via APIs, remixed, and so on.

Excerpt of Coordinates module

A screenshot of part of the new Coordinates module, written in Lua by User:Dragons flight. Note that, with Lua, we can actually use proper conditionals.

But in the future, we could have citations stored in Wikidata and then put together onto article pages using Lua, or even assembled into other various reasonable forms (automatically generated bibliographies?) using Lua, and it will be more easy for Zotero users to discover. That’s just one example; on all our sites over the next few years, things will change from the status quo in a user-visible way. The old math and geography templates were inefficient and hard to hack; once rewritten, they’ll run faster and perhaps editors will use them more. We might see galleries, automatic data analyses, better annotated maps, and various other interesting processes and queries embedded in Wikimedia pages.

Open for change

Wikimedians have been writing wikitext templates for years, and doing hard, astounding, unexpected things with them for readers to enjoy. But the steep learning curve drove contributors away. With Lua, a genuine programming language, people now have a deeper and more useful foundation to build upon. And for years, power users on our sites have customized their experiences with JavaScript/CSS Gadgets and user scripts, but those are basically one level above skins preferences; other people won’t stumble upon your hacks in the process of reading an article.

So, now is the first time that the Wikimedia site maintainers have enabled real coding that affects all readers. We’re letting people program Wikipedia unsupervised. Anyone can write a chunk of code to be included in an article that will be seen by millions of people, often without much review. We are taking our “anyone can edit” maxim one big step forward.

If someone doesn’t like the load time of a webpage, they can now actually improve it themselves. Just as we crowdsourced building Wikipedia, now we’re crowdsourcing bits of infrastructure improvement. And this kind of massively multiplayer, crowdsourced performance improvement is uniquely us.

Wikitext templates could do a lot of things, but Lua does them better and faster, and now mere mortals can do it. We’re aiming to help our users learn to program, to empower themselves, and to help each other and help our readers.

We hope you’ll join us.

Sumana Harihareswara, Engineering Community Manager

New Lua templates bring faster, more flexible pages to your wiki

Starting Wednesday, March 13th, you’ll be able to make wiki pages even more useful, no matter what language you speak: we’re adding Lua as a templating language. This will make it easier for you to create and change infoboxes, tables, and other useful MediaWiki templates. We’ve already started to deploy Scribunto (the MediaWiki extension that enables this); it’s on several of the sites, including English Wikipedia, right now.

You’ll find this useful for performing more complex tasks for which templates are too complex or slow common examples include numeric computations, string manipulation and parsing, and decision trees. Even if you don’t write templates, you’ll enjoy seeing pages load faster and with more interesting ways to present information.

Background

The text of English Wikipedia’s string length measurement template, simplified.

MediaWiki developers introduced templates and parser functions years ago to allow end-users of MediaWiki to replicate content easily and build tools using basic logic. Along the way, we found that we were turning wikitext into a limited programming language. Complex templates have caused performance issues and bottlenecks, and it’s difficult for users to write and understand templates. Therefore, the Lua scripting project aims to make it possible for MediaWiki end-users to use a proper scripting language that will be more powerful and efficient than ad-hoc, parser functions-based logic. The example of Lua’s use in World of Warcraft is promising; even novices with no programming experience have been able to make large changes to their graphical experiences by quickly learning some Lua.

Lua on your wiki

As of March 13th, you’ll be able to use Lua on your home wiki (if it’s not already enabled). Lua code can be embedded into wiki templates by employing the {{#invoke:}} parser function provided by the Scribunto MediaWiki extension. The Lua source code is stored in pages called modules (e.g., Module:Bananas). These individual modules are then invoked on template pages. The example: Template:Lua hello world uses the code {{#invoke:Bananas|hello}} to print the text “Hello, world!”. So, if you start seeing edits in the Module namespace, that’s what’s going on.

Getting started

The strlen template as converted to Lua.

Check out the basic “hello, world!” instructions, then look at Brad Jorsch’s short presentation for a basic example of how to convert a wikitext template into a Lua module. After that, try Tim Starling’s tutorial.

To help you preview and test a converted template, try Special:TemplateSandbox on your wiki. With it, you can preview a page using sandboxed versions of templates and modules, allowing for easy testing before you make the sandbox code live.

Where to start? If you use pywikipedia, try parsercountfunction.py by Bináris, which helps you find wikitext templates that currently parse slowly and thus would be worth converting to Lua. Try fulfilling open requests for conversion on English Wikipedia, possibly using Anomie’s Greasemonkey script to help you see the performance gains. On English Wikipedia, some of the templates have already been converted  feel free to reuse them on your wiki.

The Lua hub on mediawiki.org has more information; please add to it. And enjoy your faster, more flexible templates!

Sumana Harihareswara, Engineering Community Manager

How the Technical Operations team stops problems in their tracks

Last week, you read about how Wikimedia Foundation’s Technical Operations team (“Ops”) spent hundreds or thousands of staff hours to refactor and automate all the services it provides, to prepare for the January data center migration. One reward from that work: our sites were not down as often, and when they were, downtime was for better reasons.

“Another thing that illustrates our growth and maturity is our downtime,” says Operations engineer Peter Youngmeister. “Something that’s less visible to people outside of Ops is the kind of downtime we have. For example, we no longer have much downtime of the variety of ‘Oops, bumped that cable’ or ‘That one box died,’ because things are much more robust now, much more redundant. A lot of that is a product of the massive automation push we’ve been going through, which lets us create redundancy far more easily, and lets us spend our time not fighting fires.”

Wikimedia Foundation engineer Roan Kattouw adds: “Or, ‘the master DB server has a full disk’ — that one happened a few times a few years ago, and doesn’t happen any more now.”

To fix crises fast, we need monitoring: tools that automatically check for problems and alert our engineers when something is broken. In the very early days of our sites, we simply trusted that there would usually be a sysadmin online and available in case someone noticed a problem and complained on IRC. Several years ago, we began to use Nagios for monitoring and assigned a “pager duty” rotation to decide who might be woken up by a crisis.

Nagios runs coarse automated tests on the behavior of our site (such as “Does port 80 return an HTTP 301?”) and checks certain key numbers to make sure they’re within the desired range (for instance, to test whether we’re running out of memory). If a test fails, Nagios sends out email, IRC, and SMS alarms.

Monitoring helps us address the crisis faster, but it often doesn’t help with the actual problem-solving.

“Nagios is great for telling you when things are broken, and crap for telling you why,” Peter explains. “The work that Asher Feldman has done creating profiling data is more useful.”

Monitoring our servers (here in Ashburn, Virginia) helps to minimize outages and services disruptions.

Monitoring our servers (here in Ashburn, Virginia) helps to minimize outages and services disruptions.

As Roan puts it: “Profiling is the act of generating data on ‘How much time does large task X spend doing small subtask Y?’ The reason for that is that 1) one of those small Ys might actually be not so small, and be a problem, and 2) per the 80-20 rule, for some Ys, optimization will have a larger impact, so you wanna find those.” Profiling generates knowledge about the behavior of our systems, so that engineers can better understand how the cluster should be operating, and offers data points for troubleshooting.

We use two profiling systems to get time-series performance data: Ganglia at the “host” level, and Graphite at the “application” level (get a Labs login to see Graphite). In the past two years, we’ve configured Ganglia to cover much more data, and in 2012 began to use Graphite. The better data makes it more useful for troubleshooting, and Director of Operations CT Woo regularly checks the dashboard to look out for upcoming problems and alert his team. This reduces downtime.

For example, on one ganglia page, we previously only had access to host data: free disk, load, etc. We have recently added the Apache-specific data, such as requests per second and number of idle threads. This additional information aids sysadmins in troubleshooting. “One can look at it and make better deductions than just ‘Yup, server’s under a lot of load…’,” explains Peter.

Like puppetization, improvements in profiling were an investment by the Ops team. “There’s a plug-in for Ganglia that does Apache performance stats. It took me a couple of hours to set it all up. But, again, that’s being forward-thinking, debt that we had to work off instead of just cursing ourselves when it wasn’t there when we needed it. It’s a massive undertaking to decide to do things The Right Way, set up a platform, instead of doing a million one-offs.”

While puppetizing and improving monitoring and profiling to prepare for the data center migration, the Operations team had to defer other non-urgent work. “Ops was less able to give support to many teams,” says Peter. “For example, Fundraising just had a couple of boxes and could do whatever they wanted on them, as opposed to now where [Operations Engineer] Jeff Green is working on making an awesome, PCI-compliant system with them full time. Or, Analytics was very independent/unsupported, because there were so little human-hours to give to supporting things that weren’t just keeping the site up… I think that the EQIAD [Virginia Data center] build-out is very demonstrative of the amount of [technical] debt that Ops was in.”

Now, Peter is looking forward to seeing Wikimedia “spin up more data centers dramatically more quickly.” The Operations team is making preparations for an additional data center on North America’s west coast. Site Architect Asher Feldman sees a “continuing arc of refinement” in the team’s future, rather than “challenges that end, to be replaced by new ones.” “The challenges of making MediaWiki scale aren’t going to go away any time soon; nor will the need for incremental architecture modernization at multiple levels.” For instance, Ops needs to continue puppetizing certain services; some modules also need their Puppet manifests tweaked so that they work not just on the main site, but also in Wikimedia Labs.

You can check out the Operation’s team 2012–2013 goals to find out more about what’s next (including improvements in search and security).

Sumana Harihareswara, Engineering Community Manager

From duct tape to puppets: How a new data center became an opportunity to do things right

Last week, the Wikimedia Foundation flipped a historic switch: we transitioned our main technical services to a shiny new data center in Ashburn, Virginia. For the first time since 2004, Wikimedia sites are no longer primarily hosted in Tampa, Florida.

Peter Youngmeister works in the Wikimedia Foundation's Technical Operations team.

Peter Youngmeister works in the Wikimedia Foundation’s Technical Operations team.

To help understand this grueling journey (and why it’s crucial), look through the eyes of Wikimedia Foundation engineer Peter Youngmeister. Peter joined the Wikimedia Foundation’s Technical Operations team (“Ops”) about two years ago, in March 2011. At the time, “the team” meant “about six engineers supporting the fifth-most visited site on the Web,” said Peter. The Foundation has now increased its Ops team to 14, and has several job openings.

“This also meant that out of the fast/cheap/well triangle, we’d gone with fast and cheap,” Peter recalled. We made quick-and-dirty solutions because problems had to be solved immediately. “With so few Ops engineers, you’re always playing catchup; long-term is hard.” He said that the digital infrastructure when he arrived was “kinda like many many layers of really artfully applied duct tape.”

And the biggest, most pressing flaw: Wikimedia only had one fully functional primary data center, in Tampa, Florida. If something catastrophic happened to Tampa, all the sites would go down until new servers could be brought online and data recovered from backup. So the Ops team chose a new data center location, in Ashburn, Virginia, and started preparing to integrate it into our infrastructure. But the preparation of EQIAD, which began in 2011, turned out to require much more work than the Operations and Platform engineering teams had foreseen.

We had never set up a data center of this complexity from scratch before. The systems in Tampa were “layers of duct tape that had been built up over years… Our first problem was that, for example, very little was in Puppet,” Peter said. To configure the Wikimedia servers, we use Puppet, a configuration management system, which lets us write code (Puppet “manifests”) that manages all of our servers like a single large application (and more easily track, troubleshoot, and revert changes).

Since the new data center would exactly mirror the old one, leveraging the power of Puppet to keep our configurations in sync would be crucial. But since our infrastructure included dozens of services that weren’t in Puppet yet, we had to examine each of their configurations to “puppetize” them. And in early 2011, Peter noted, “our whole search infrastructure existed outside of Puppet control. Our Puppet manifests for our databases were a file that just had a comment that said ‘domas is a slacker.’”

In short, Wikimedia needed not only to replicate the functionality that had been incrementally added over ten years, but to refactor it into an automatable form so that the third, fourth, etc. replications would be far easier. So, in addition to the Ops team’s day-to-day responsibilities for site maintenance and crisis management, Ops and Platform teams needed to find hundreds or thousands of staff-hours to refactor, automate and add monitoring to all the services it provided. We aren’t done yet with our “mass puppetization” investment, which we’ve been working on for at least two years.

The core application (MediaWiki) is only one of the myriad moving parts that needed attention; over the past two years, we’ve puppetized and strengthened databases, search, fundraising code, logging and analytics tools, caches, the Nagios monitoring software and dozens of other services. Take search as an example: several years ago, the Wikimedia Foundation used one search server to cover nearly all the wikis other than English Wikipedia — a dangerous single point of failure. Peter arrived at the Foundation and found that none of the search infrastructure was puppetized. After he worked significantly on search, as of November 2012, he noted we had “two fully independent search setups, one in each data center. Fail-over takes a couple of minutes at most.”

Puppet Tutorial: Video from the Wikimedia Foundation tech days, September 11, 2012, explaining Puppet configuration management in the context of Wikimedia’s site/services infrastructure. Speaker/slides: Ryan Lane.

Puppetizing the configuration files, and using Gerrit to manage code review and approval also gave us better transparency and helped staff and volunteers collaborate better on improvements, maintenance and troubleshooting. Anyone can see how our servers are configured, read the Puppet configuration “manifests,” propose new changes and view and comment on pending proposals.

In contrast, “when I got here, everything was done on a local Subversion repository or our puppetmaster, and then pushed out from there, which kinda works if you have 6 or fewer people,” Peter said. (The Puppetmaster is the master repository that instructs all the other boxes in the cluster to update their manifests, and thus updates their packages and configurations.) To keep track of configuration changes, people simply used an IRC bot to log summaries of their actions to the server admin log, which made it hard to revert changes or help train new teammates. “But also, when the Ops team is only 6 people, and everyone has been around for years, everyone just knows all the parts,” he explained.

As they created the 700+ hostclasses currently defined in Puppet, Operations engineers moved towards treating our infrastructure as a codebase, and thus from pure systems administration towards a DevOps approach. As of November 2012, “we’re very nearly at a point where we can manage our whole infrastructure without needing to log into hosts, which is the whole goal,” Peter said with a smile. Logging into hosts is a bad thing “because it means that you’re doing things by hand and/or that what you’re doing isn’t going through code review. Moving to Gerrit for our Puppet repos is awesome: It means I can really easily see what my coworkers are doing. I can ask for review when needed. It’s a huge sign of maturation of our department.”

Their years of work have led to a nearly painless data center migration, but it also began paying off immediately with reduced downtime. You’ll read more about that in the second part of this story next week.

Sumana Harihareswara, Engineering Community Manager

Lead our development process as a product adviser or manager

Would you like to decide how Wikimedia sites work? You can be a product adviser or a product manager, as a volunteer, and guide the work of Wikimedia Foundation developers.

What is a product manager? As Howie Fung, the head of WMF’s product team, recently explained, when we create things on our websites or mobile applications that readers or editors would use,

there are a basic set of things that need to happen when building a product….
  1. Decide what to build
  2. Design it
  3. Build it
  4. Measure how it’s used (if you want to improve the product)
Roughly speaking, that’s how we organize our teams when it comes to building features. Product Managers decide what features to build, Designers design the feature, Developers build the feature, and Analysts measure how the features perform.

So, a product manager works with the designers, developers, and analysts to identify and solve user problems, while representing the users’ point of view. As Fung put it,

there should be someone responsible for ensuring that the various ideas come together into a coherent whole, one that addresses the problem at hand. That responsibility lies with the Product Manager.

Why do you need volunteers? While the Wikimedia Foundation has hired full-time product managers for the most pressing features our engineers are developing, that leaves us with several ongoing projects that don’t get enough product management. The WMF needs your help to: track the progress of these improvements; comment on tasks or proposals; reach out to the Wikimedia reader and contributor communities to ask for feedback via wikis, mailing lists, and IRC; help developers see what users’ needs are; and set priorities on bugs and features, thus deciding what developers ought to work on next. Here are a few of those activities:

  • File storage, especially regarding Wikimedia Commons. Engineers have been trying to improve our storage system using the Swift distributed filestore but need your help to make sure we do it right.
  • Prioritizing shell requests. When Wikimedians request configuration changes to the wikis, systems administrators can use help understanding which of them are urgent and which of them don’t actually have the necessary consensus.
  • Operations requests from the community. It’s not just shell requests. Right now we have 93 open bugs requesting attention from our systems administrators, and those requests could use prioritization and organization.
  • Data dumps. Wikimedia offers many ways to download Wikimedia data at dumps.wikimedia.org. Your help would improve tools related to import, or conversion to SQL for import, to make it easier for others to use these datasets.
  • Wikimedia Labs. The sandboxes in Wikimedia Labs will host bots, tools, and test and development environments; can you organize the advice on the roadmap and what those communities will need?
  • Admin tools development: WMF engineer Chris Steipp works on tools to help fight vandalism and spam, including major bugfixes and minor feature development to make lives of stewards and local sysops a little easier. What’s most urgent on his TODO list?

Volunteer product manager Jack Phoenix put together a detailed roadmap that was incredibly useful to guide the work of Wikimedia engineers on features like anti-spam tools.

Has anyone tried this? The first Wikimedia volunteer product manager was User:Jack Phoenix, who created the admin tools roadmap this summer, detailing a rationale for what should be done when. Jack originally signed up because:

this is just something that I know pretty well and hence why I want to be a part of this project and the team….
I want editors to be able to focus on editing — content creation, tweaking, fine-tuning… — instead of having to play whack-a-mole against spambots and vandals all the time. I have plenty of experience in playing whack-a-spambot, and I’m hoping to use that experience to improve WMF sites and also third-party sites…

It’s perfectly fine for the role of volunteer product manager to be a time-limited engagement. For example, Jack did amazing work for three months creating the roadmap. In retrospect, Jack Phoenix has estimated that to manage a product as broad as the admin tools suite, and to do it well, would take at least an hour per day if not two or three; due to time constraints, Jack has now stepped down from the role and is seeking a successor. Thanks for laying the groundwork, Jack! While we’re sad to see Jack go, we’re thankful for the roadmap and we continue to benefit from it.

If that kind of commitment sounds too burdensome, consider becoming a volunteer product adviser first. You’d do some of the same tasks as a product manager, to help check that the feature we’re building actually meets Wikimedians’ needs, and give your own opinion as well. But there wouldn’t be ownership or leadership attached, and the time commitment wouldn’t be as strong.

What next? The goal of the Engineering Community Team is to have at least two Wikimedia volunteers engaged in product management work by the end of December. Talk with us and check out whether this is something you’d like to try!

To get involved, contact Sumana Harihareswara or Guillaume Paumier.

Sumana Harihareswara
Engineering Community Manager

Google Summer of Code students reach project milestones

This year, the MediaWiki community again participated in Google Summer of Code, in which we selected nine students to work on new features or specific improvements to the software. They were sponsored by Google and mentored by experienced developers, who helped them become part of the development community and guided their code development.

Congratulations to the eight students who have made it through the summer of 2012 (our seventh year participating in GSoC)! They all accomplished a great deal, and many of them are working to improve their projects to benefit the Wikimedia community even more.

Google Summer of Code 2012

Eight students passed MediaWiki’s GSoC program in 2012.

  • Ankur Anand worked on integrating Flickr upload and geolocation into UploadWizard. WMF engineer Ryan Kaldari mentored Ankur as they made it easier for Wikimedia contributors to contribute media files and metadata. Read his wrapup and anticipate the merge of his code into the main UploadWizard codebase.
  • Harry Burt worked on TranslateSvg (“Bringing the translation revolution to Wikimedia Commons”). When his work is complete and deployed, we will more easily able to use a single picture or animation in different language wikis. See this image of the anatomy of a human kidney, for example; it has a description in eight languages, so it benefits multiple language Wikipedias (e.g., Spanish and Russian). Harry aims to allow contributors to localize the text embedded within vector files (SVGs), and you can watch a demo video, try out the test site, or just read Harry’s wrapup post. WMF engineer Max Semenik mentored this project.
  • Akshay Chugh worked on a convention/conference extension for MediaWiki. Wikimedia conferences like Wikimania often use MediaWiki to help organize their conferences, but it takes a lot of custom programming. Under the mentorship of volunteer developer Jure Kajzer, Akshay created the beta of an extension that a webmaster could install to provide conference-related features automatically. See his wrapup post.
  • Flow diagram for Ashish Dubey's project

    Ashish is working on the architecture that will support real-time collaboration.

    Ashish Dubey worked on realtime collaboration in the upcoming Visual Editor (you may have seen “real-time collaborative editing” in tools like Etherpad and Google Docs). Ashish (with WMF engineer Trevor Parscal as mentor) has implemented a collaboration server and other features (see his wrapup post) and has achieved real-time “spectation,” in which readers can see an editor’s changes in realtime. Wikimedia Foundation engineers plan to integrate Ashish’s work into VisualEditor around April to June 2013.

  • Nischay Nahata optimized the performance of the Semantic MediaWiki extension. In wikis with unusually large amounts of content, Semantic MediaWiki experiences performance degradation. With the mentorship of head Semantic MediaWiki developer Markus Krötzsch (a volunteer) and Wikidata developer Jeroen De Dauw, Nischay found and fixed many of these issues.  This also reduces SMW’s energy consumption, making it greener. Nischay’s work will be in Semantic MediaWiki 1.8.0, which is currently in beta and due to be released soon. Wikimedia Labs uses Semantic MediaWiki and will benefit from the performance improvements.
  • Proposal to redesign the MediaWiki watchlist

    Arun Ganesh illustrated the watchlist redesign proposal with this mockup.

    Aaron Pramana worked on watchlist grouping and workflow improvements. Aaron wants to make it easier for wiki editors and readers to use watchlists, and to create and use groups of watched items to focus on or share. Aaron worked with volunteer developer Alex Emsenhuber. The back end of the system is done, but Aaron wants your input about the user interface. Folks on the English Wikipedia’s Village Pump have started discussing it.

  • Robin Pepermans worked on Incubator improvements and language support, mentored by WMF engineer Niklas Laxström. If you’ve ever thought of using Wikimedia’s Incubator for new projects, it’s now easier to get started. Read Robin’s wrapup post for more.
  • Platonides worked on a desktop application for mass-uploading files to Wikimedia Commons. The application will eventually make it much easier for participants in upload campaigns like Wiki Loves Monuments to upload their photos (and it’ll work on Windows, Linux, and Mac OS). I mentored Platonides, who delivered a beta version.

As further progress happens, we’ll update our page about past GSoC students. Congratulations again to the students and their mentors. And thanks to volunteer Greg Varnum, who helped me administer this year’s GSoC, and to all the staffers and volunteers who helped students learn our ways.

Sumana Harihareswara, Engineering Community Manager

Wikimedia engineering June 2012 report

Major news in June include:

Events

Recent events

Berlin hackathon (1–3 June 2012, Berlin, Germany)

Approximately 104 participants from 30 countries came to Berlin, including MediaWiki developers, Toolserver users, systems administrators, bot writers and maintainers, Gadget creators, and other Wikimedia technologists. The community also learned more about the Wikidata and RENDER projects. More updates, links to videos, and followups are on the talk page.

Upcoming events

Pre-Wikimania hackathon (10–11 July 2012, Washington, D.C., USA)

Open source teaching nonprofit OpenHatch will be aiding in organizing and running this two-day event, with Katie Filbert, Gregory Varnum, and Sumana Harihareswara. Experienced Wikimedia technologists will collaborate on their own projects, while interested new developers will be able to learn introductory MediaWiki development. Accessibility will be one of the event themes. The event is free to attend even for those not attending Wikimania itself.

(more…)

Wikimedia Foundation selects nine students for summer software projects

We received 63 proposals for this year’s Google Summer of Code, and several mentors put many hours into evaluating project ideas, discussing them with applicants and making the tough decisions. We’re happy to announce our final choices, the Google Summer of Code students for 2012:

MediaWiki logo

All nine of these students are working on MediaWiki, the software that powers Wikimedia sites.

Congratulations to this year’s students, and thanks to all the applicants, as well as MediaWiki’s many mentors, developers who evaluated applications, and Google’s Open Source Programs Office. The accepted students now have a month to ramp up on MediaWiki’s processes and get to know their mentors (the Community Bonding Period) and will start coding their summer projects on or before May 21st. As the organizational administrator for MediaWiki’s GSoC participation, I’ll be keeping an eye on all nine students and helping them out.

Good luck!

Sumana Harihareswara, Volunteer Development Coordinator

Google Summer of Code 2012

Google Summer of Code 2012