Wikimedia engineering August 2012 report

Translate This Post

Major news in August include:

Note: Following a reader’s advice, we’re trying out slightly colored backgrounds to help readers skim through sections. Let us know in the comments how that works for you, and how to improve it for the next report.

Events

Recent events

Wikipedia Engineering Meetup (15 August 2012, San Francisco, USA)

Approximately 100 people attended the first Wikipedia Engineering Meetup in San Francisco, in a series meant to showcase Wikimedia’s interesting engineering problems and products to the local developer community. Tentatively, the meetup will happen every two months at the Wikimedia offices in San Francisco, and will consist of three 15-minute engineering presentations, followed by a question & answer period bracketed by mingling. The inaugural meetup featured talks about Mobile engineering, Analytics and the VisualEditor.

Upcoming events

Wikimedia’s internationalization and mobile teams are tentatively planning a volunteer outreach event in Bangalore, India, November 9–11. More information will come in September.

Engineering metrics in August:

Personnel

Work with us

Are you looking to work for Wikimedia? We have a lot of hiring coming up, and we really love talking to active community members about these roles.

Announcements

Technical Operations

Site infrastructure

Continuing from his earlier MySQL work, Asher Feldman built additional MySQL servers for each of the clusters in Ashburn, all in preparation for the primary data center migration in the coming quarter. In the Tampa datacenter, he added a new server to the English Wikipedia (en.wp) cluster and replaced the en.wp master with newer hardware. A database tree chart provides the latest information on our database clusters.
Thanks to Varnish Software support, we have a new build of Varnish that comes with persistent cache and the video streaming bug fix. Mark Bergsma tested the build on one of the mobile Varnish servers, and so far it has been stable. In the coming days, Mark will be updating the ‘upload’ Varnish cluster at Ashurn (Eqiad) and move traffic through them.
Mark has also successfully updated and deployed the NetApp storage servers and enabled replication from Tampa to Ashburn. He started working on migrating some of the systems that are mounting to nfs1 to this new server. With this, Mark has resolved another critical path item on the migration to the new primary data center. In addition, Jeff Green started using the nas1-a to archive the Fundraising banner logs.

Network Infrastructure

The usual traffic surge due to the new school year caused an increase in package loss on our Tampa internal network. With Chris Johnson’s help, Mark upgraded the links between the racks. Earlier this month, Leslie Carr and Chris installed a new passive optics (CWDM) system between the 2 floors of the Tampa datacenter hosting our servers, giving us effectively a 4X capacity increase.

Fundraising Infrastructure

Jeff continues to make progress in the Fundraising infrastructure buildup at Ashburn (EQIAD). With Leslie’s help, the new firewall was set up and Jeff deployed a build host, a logging host, the application cluster and built the pxeboot, preseed and puppet configurations. He has also enabled nagios-nsca monitoring for those new hosts.

Object Store/Swift

‘Originals’ were successfully copied over to the Swift cluster from the ms7 (a NFS filer for images). In addition to serving thumbnails (which was completed last month), Swift is now also the primary object store for Images and multimedia contents. In the current setup, MediaWiki reads from Swift only, but writes to both the Swift cluster and the legacy NFS servers (ms5 & ms7). In the coming months, we will be disabling ms5 & ms7, and run solely on Swift.

Wikimedia Labs

This month was mostly spent on upgrading all of the Labs infrastructure. OpenStack nova and glance were upgraded to the essex release. The keystone service was added and now handles all authentication for Labs-related OpenStack services. OpenStackManager was upgraded to support keystone, use the OpenStack API rather than the EC2 API, and to have multi-region support, in anticipation of the new region we’ll be bringing up in Eqiad. Testing of ceph as a replacement of gluster for project storage continued during this month; more testing is required. A lot of puppet work has been done to start moving our spaghetti code-style repository into modules.

Data Dumps

We’ve been focusing on the media infrastructure, working on the migration to Swift, and also taking a hard look at scaled media usage and storage. Since scaled media (thumbnails) could be regenerated at will from the original, we are going to evaluate treating thumb storage as a medium-to-long term cache rather than permanent disk storage as we have been doing. Running the numbers on existing thumbs turned up some interesting results. We’re still bringing mirrors online; we’ve gotten all the hardware and network issues worked out with WANSecurity and have started copying over the data. They’ll have files most mirrors don’t host: page view files, archives, and more, as well as a full copy of our media files.

Features Engineering

Editing tools

VisualEditor

In August, the team focused on overhauling the code design of VisualEditor so that it is more modular and easier to extend. This involves creating and documenting a number of formal APIs at each point in the architecture, that means a developer does not have to understand the entire code base to be able to add new features. The early version of the VisualEditor on mediawiki.org was updated twice (wmf9 and wmf10), fixing a number of bugs, as well as adding a much-improved link inspector to help users build links, and a save dialog that better guides users on what to do..

Parsoid

The Parsoid team reached a major milestone in August by implementing a template output encapsulation algorithm, and started to use it to support expanded template round-tripping. In parallel with this and the usual smaller tweaks, work on a C++ port of the parser was started. The port is expected to allow an efficient integration with PHP and Lua, improve performance and allow the parallelization of the parser in the longer term.

Editor engagement

Article feedback

This month, we developed a range of new features for Article Feedback, which is now deployed on 10% of the English Encyclopedia. Improvements include the ability to view feedback from my watched pages, hide my posts, give feedback on help pages — as well as enable editors to clear all flags and administrators to protect articles to limit feedback on controversial pages. These and other features can be tested on this sample article feedback page or on the central feedback page (please report any bugs on Bugzilla). For more information about this tool, check our project overview. We are now in our productization phase (support for more platforms, scalability, code re-factoring, localization, metrics, mobile) and are aiming for a full release to 100% of English Wikipedia by the end of October 2012 — with other wiki projects starting later this year. This new tool was created in collaboration with the Wikipedia community and developed by Fabrice Florin, Matthias Mullie, Pau Giner, Ryan Kaldari, Roan Kattouw, Oliver Keyes, Chris McMahon, Benny Situ, Dario Taraborelli, Howie Fung and Terry Chay, in association with OmniTI.

Page Curation

This month, we deployed a ‘pre-release’ version of Page Curation on the English Wikipedia. This new product includes two main features: 1) the New Pages Feed, a dynamic list of new pages for review by community patrollers; and 2) the Curation Toolbar, an optional panel on article pages, which enables editors to quickly review these pages. The Curation Toolbar provides a variety of tools that let users get page info, mark a page as reviewed, tag it, mark it for deletion, send WikiLove to page creators — or jump to the next page on the list. This month, we completed development on final features, such as the ability to send a personal note to page creators, as well as special logs, links and templates, as outlined in this help page and these project slides. We are now preparing for a full release of Page Curation on the English Wikipedia at the end of September 2012. Check out the current beta version on the English Wikipedia, as well as the latest version on Wikimedia Labs (confirmed editors can click “Review” to curate any article on the New Pages Feed). Please report any bugs on Bugzilla. Formerly called ‘Page Triage’, this new tool was designed in close collaboration with the Wikipedia community and developed by Ryan Kaldari, Benny Situ, Fabrice Florin, Oliver Keyes, Brandon Harris, Vibha Bamba, Terry Chay and Howie Fung.

MicroDesign

This month saw the creation of the Micro Design Improvements team, an ad-hoc group of staffers who look at small but useful design improvements to make to MediaWiki. Vibha Bamba, Oliver Keyes and Munaf Assaf (with assistance from Howie Fung) worked on the design for our first feature, which simplifies the “edit” window. The team is very grateful to Terry Chay for securing technical assistance in the form of Rob Moen, who has agreed to donate his 20% time to working on this project. In the coming month, we plan to talk to the community about this feature, deploy it, and work on more of the items on our to-do list; if you have any thoughts about our current work or ideas for future projects, please leave us a note on the project talkpage.

Editor engagement experiments

We deployed and ran the first iteration of post-edit feedback, testing whether various types of positive feedback after submission of an edit increase the productivity and retention of Wikipedia editors. (The results will be publicized soon.) We are currently working on the next iteration of post-edit feedback and on a new experiment which centers around the account creation process. We’ve also deployed click-tracking to the English Wikipedia community portal, account creation page, and the article edit form, and devised a tool for generating reports from the raw log data. Working with Asher Feldman, we’ve also architected an alternative data pipeline for event tracking, and begun its deployment.

Multimedia Tools

TimedMediaHandler

In August we concentrated on testing on the testwiki, and found some issues that need addressing. The project is on hold for now, but we expect to resume in September.

MediaWiki infrastructure

ResourceLoader

After the sprint in July, there was no notable progress as the team were busy with other urgent projects. There was the start of a community discussion about where global gadgets will be hosted for access across the Wikimedia cluster, and about their licensing (as they have generally been caught by the content license, which is less suitable for code).

Notifications

Andrew Garrett deployed Echo to MediaWiki.org, but it was temporarily turned off pending a bug that has recently been fixed. Vibha Bamba is working on some of the UI backlog.

Messaging

Work on Flow will officially start in January. In the meantime, preparatory work will focus on Database sharding.

Feature support

2012 Wikimedia fundraiser

The fundraising team completed 3 very successful sprints, completing more work in each sprint than some of the previous sprints combined (Sprint 7: Auditing and Reconciliation; Sprint 8: Amazon, and a bunch of other random stuff; and Sprint 9: Adyen, Amazon wrap-up, and Listeners). During the sprints, the team integrated with Amazon Payments, added features to CiviCRM to enable the settlement of donations in multiple currencies, added features (including the beginning of an API) and made bugfixes to CentralNotice, discovered and dealt with an issue in the global credit card processing system, and began integration on a new payment processor that will give the fundraising team access to additional payment methods around the world.

Internationalization and Localization

Internationalization and localization tools

The team continued to work on the Universal Language Selector (ULS): the display settings dialog was completed and is now able to show and set WebFonts, similarly to the WebFonts extension which will be phased out once the ULS is deployed. The lists of languages were tweaked to emphasize those likely to be chosen by the user, based on their location and past selections. Translation memory was deployed on Meta and CLDR plurals support was merged into the core master. User experience testing of the Translate extension is in progress. Initial analysis for i18n metrics was also completed and published. The team conducted its monthly office hours, a bug triage and development showcase.

Milkshake

Development on Project Milkshake continued at a lower priority due to the focus on the Universal Language Selector this month. We are getting some basic blocks together in our GitHub repositories.

Mobile

Contributors

Wiki Loves Monuments App

The mobile team released three new betas fot the WLM app and published the last one on Google Play. We finalized many new features like saving for later, showing current location, and cleaned up data issues. The contest started on September 1st.

Readers

Wikipedia Zero

Configuration of partner data is now more configurable and various additional partners are now in testing mode. List of launches to follow.

J2ME App

Open Path delivered numerous new builds of the Wikipedia J2ME app this month. Patrick Reilly and the Global Development team did internal testing to validate that they were performing as expected. We’re now feature-complete and spending cycles on making the app perform better on low memory devices. We expect to complete this project in a few weeks.

Wikipedia over SMS & USSD

Production hardware is now in place and running the latest builds via puppet configuration.

Offline

Kiwix

Our work mostly focused on the 0.9 RC2 (see CHANGELOG) which should be released soon after we port kiwix-serve to MS/Windows. Kiwix UI localization was improved, thanks to the Translatewiki Rally; four new languages have been added. For the ZIM autobuild project, we have migrated the server to a datacenter in Zurich, Switzerland, and coding work is ongoing. We are planning our next projects and seeking volunteer help.

Platform Engineering

MediaWiki Core

MediaWiki 1.20

MediaWiki 1.20wmf9 and MediaWiki 1.20wmf10 were deployed to the Wikimedia sites in August. We plan to continue with bi-weekly deployments through September.

Git conversion

Chad Horohoe spent a good amount of time fixing issues upstream, including two big improvements to the project listing page. He also cleaned up the Gerrit installation on Labs to more accurately mirror production—also cleaning up the production setup along the way. Initial research was done into replication to GitHub. Finally, Gerrit 2.5 is nearing release, which brings a bunch of new features (like plugins) and fixes. The Labs instance of Gerrit is already running the release candidate. In September, we’ll be upgrading to Gerrit 2.5 and getting repositories replicated out to GitHub.

SwiftMedia

All Wikimedia sites are now using Swift as the primary storage mechanism for multimedia files such as images (both original images as well as image thumbnails). We continue to write images to our old NFS server as well, though we plan to turn this off in September. Some specialized extensions still use the old NFS server, such as the Math and Timeline extensions. These will be migrated to Swift soon (tentatively in September).

Lua scripting

The Scribunto extension has been deployed to test2.wikipedia.org and www.mediawiki.org, and several editors are porting existing templates such as Cite over to Lua (see recent changes in the “Module:” namespace)

OAuth

Chris Steipp mostly focused on projects other than OAuth. However, he continued to gather requirements in preparation, including the need for Wikimedia to finalize SUL migration beforehand.

Site performance

In addition to the Lua work, Tim Starling did some investigation of parallel parsing, but that project may go on the backburner until after Parsoid goes into production.

Incremental architectural improvements

Tim Starling wrote a new Redis-based client for session handling. This will be important for the Virginia Datacenter Migration.

Admin tools development

Chris Steipp added two new major features to the AbuseFilter extension, global rules and global throttling. Code review was done by Tim Starling and the changesets were merged successfully. These features will allow the creation of filters that apply to all Wikimedia wikis, which is effective for stopping cross-wiki spambots. Jack Phoenix released the Phalanx extension and began working on making it suitable for deployment on Wikimedia servers. During the rest of 2012, the team will work on through their roadmap: CentralAuth mass account locking, improving, stabilizing and reviewing Phalanx, and evaluating the effectiveness of the current CAPTCHA system and possible replacements for it.

Code review management

The analytics team released code review graphs, and Brian Wolff created a tool showing a view of unmerged patchsets and a “Wall of Shame” for authors with several patchsets requiring improvement. Both tools helped inform the discussion about the code review situation. Sumana Harihareswara encouraged authors to take steps to get their code reviewed faster, and actively requested reviews for many submissions.

Security auditing and response

Improved filtering in uselang with MediaWiki 1.20/wmf8 fixed several DOM-based XSS vulnerabilities in different gadgets. Chris Steipp fixed 4 security issues in core, and released MediaWiki 1.19.2 and 1.18.5 to include them.

Quality assurance

QA and testing

This month saw an emphasis on hiring, with excellent candidates currently being considered for all the positions that will be closely related to QA. With AFTv5 in place in production, testing focus shifted to NewPagesFeed and Page Curation Toolbar. Due to conflicts of holidays, vacations, time of year, meetings, and general complications, we decided not to hold an explicit community test event for NewPagesFeed/Curation, but test environments and a test plan will be available for those interested to explore this new feature.

Beta cluster

The MediaWiki core and its extensions are now automatically updating, and the beta cluster is now always using the very latest version published under the master branch of each Git repository.

Continuous integration

The TitleBlacklist extension is the first MediaWiki extension for which tests are now automatically run via Jenkins. The dashboard is at https://integration.mediawiki.org/ci/job/Ext-TitleBlacklist/ and build status is sent back to Gerrit.

Wikimedia analytics

Report card

The team started preparations to move hosting from Labs to a dedicated server (stat1001), and is investigating how to package a nodejs app.

Limn

We started working on the migration from dygraphs to d3.js, and improving documentation to ease the on-boarding of our new front-end engineer.

Kraken (Analytics Cluster)

We’re continuing the research and evaluation of udp2log replacements for getting data into the Kraken cluster.

Engineering community team

Bug management

The Wikimedia Foundation is nearing the end of its hiring process for a new Bug Wrangler, who will lead triage activities and train volunteers to triage as well. In the interim, volunteers such as Krenair and Thehelpfulone have stepped in to partially fill the gap. Volunteer Matanya Moses is planning to lead an online bug triage meeting, focusing on unreviewed patches, on September 5th.

Summer of Code 2011

A wikitech-l discussion of new user account creation drew former GSoC student Akshay Agarwal out of the woodwork to complete work on his SignupAPI extension. WMF engineers are planning to collaborate with him this autumn. Also, WMF engineers plan to review student Salvatore Ingala’s Gadgets work as they improve ResourceLoader this fall.

Summer of Code 2012

In the end, eight of Wikimedia’s nine Summer of Code 2012 students passed, and each posted a wrapup post on wikitech-l. Their achievements have already led to improvements in the Wikimedia Incubator, and improvements to Semantic MediaWiki and UploadWizard will reach users soon. Improvements to SVG translation, realtime editing collaboration, and other functionality are also progressing as the students clean up, merge, and iterate on their summer work.

Volunteer coordination and outreach

Sumana Harihareswara continued to follow up on contacts, recruit new contributors to the Wikimedia tech community, and mentor newer contributors. She granted Developer access and Gerrit project ownership requests, and worked on planning for the upcoming Bangalore outreach event. Hiring for a volunteer engineering coordinator to work on volunteer coordination and outreach is almost finished. Community discussion topics included Git and Gerrit’s difficulty, bug triages, new mailing lists, transparency and collaboration in feature design, MediaWiki releases and a potential community organization, GSoC’s effectiveness, code review, and appreciation for each other.

Wikimedia Foundation engineering 20% policy

Sumana Harihareswara is coordinating WMF engineers’ efforts to spend 20% of their work time on code review and other efforts benefiting the entire Wikimedia engineering community. Their highest priorities are fixing new urgent bugs, which surface during deployments, and addressing the Gerrit merge queue, especially for backlogged components such as Wikidata, UploadWizard, and ProofreadPage. Some participants are concentrating on bug triage, documentation, and the extensions awaiting review for deployment. Some teams were exempt in August from the 20% policy, because of pressing deadlines.

Wikidata

The Wikidata project is funded and executed by Wikimedia Deutschland.

The team has been working further on getting the code-base ready for a first deployment. You can try the current status on the demo system. Work focused on diff, undo, migrating to using the Universal Language Selector, and providing useful edit summaries in recent changes and article history. They also published a draft for the export to RDF.
The team published tasks to get started to make it easier to contribute to Wikidata.
Joan Creus released pywikidata, a framework for Wikidata bots.

Future

The engineering management team continues to update the Software deployments page weekly, providing up-to-date information on the upcoming deployments to Wikimedia sites, as well as the engineering roadmap, listing ongoing and future Wikimedia engineering efforts.


This article was written collaboratively by Wikimedia engineers and managers. See revision history and associated status pages. A wiki version is also available.


This article was edited on December 1st, 2012. The following content was changed: the number of processed shell requests was corrected.

Archive notice: This is an archived post from blog.wikimedia.org, which operated under different editorial and content guidelines than Diff.

Can you help us translate this article?

In order for this article to reach as many people as possible we would like your help. Can you translate this article to get the message out?

7 Comments
Inline Feedbacks
View all comments

I prefer the wiki version, but colours look good enough given that we have no TOC here.
Thanks also for the summary at the top.

I like the color thing 🙂

The colored sections came out pretty nicely. I wonder if it’s possible to include them in the RSS version as well 😛

Waldir: AFAICT the colors are included in the RSS feed of the blog, but they’re stripped when they pass through the Planet.

Hmmm. Maybe that’s something to report upstream to Planet/Venus? Or is there a reason for that?

Ah, a welcome return for unique contributor count. Keep up the good work 🙂

Waldir: I assume they strip CSS on purpose for consistency among posts, but I’m not familiar enough with Planet/Venus to give a definitive answer.