Wikimedia engineering July 2011 report

Translate This Post

Major news in July include:

  • Ongoing data replication from our primary Florida data center to our new Virginia data center;
  • The deployment of the Article Feedback feature to all articles on the English Wikipedia, and the deployment of MoodBar;
  • The successful implementation of a MySQL-based parser cache on Wikimedia wikis;
  • Mid-term evaluation of our Summer of Code projects.

Hover your mouse over the green question marks ([?]) to see a description of a particular project.

Events

Recent events

  • OSCON (July 25-29, Portland, Oregon, USA) — About a dozen Wikimedia engineers attended the Open Source Convention in late July. OSCON is used to showcase the latest and greatest developments in open source technologies (including hands-on tutorials), and is generally an opportunity for Wikimedia developers to stay in the loop and to network with individuals from other projects and communities. We had two presentations in the program (on the 2010-11 fundraising campaign, and on ResourceLoader), which are available in the Wikimedia engineering presentations collection. We also promoted WMF job openings at every opportunity. Finally, Danese Cooper, Sumana Harihareswara and Erik Moeller participated in a workshop with like-minded organizations regarding volunteer matching strategies for open source projects.

Upcoming events

  • Check out the Software deployments page on the wikitech wiki for up-to-date information on the upcoming deployments to Wikimedia sites.

Personnel

Job openings

Are you looking to work for Wikimedia? We have a lot of hiring coming up, and we really love talking to active community members about these roles.
The following positions opened in July:

New Requests for Proposals:

The following positions are still open: Product Manager (Analytics), QA Lead, Operations Engineer (Networking), Director of Features Engineering, Systems Engineer (Data Analytics), Networking Contractor (Amsterdam), Software Developer (Rich Text Editing, Features), Software Developer (Front-end) and Software Developer (Back-end).

Short news

Operations

Site infrastructure

  • Tampa Data Center [?] — 74 new servers were purchased to increase the capacity of our Apache cluster; they will be installed in August. Network maintenance was also performed to install a new router and replace a core switch. A number of servers were upgraded, and automated with puppet.
  • Virginia Data Center [?] — Full network connectivity was set up and the 7 wiki database clusters have now been replicated to our new servers in Virginia. We have also standardized the puppet configuration and enabled LVM snapshots. About 20 other databases (of tools like OTRS, CiviCRM, Bugzilla, WordPress and RT) have been replicated as well. Next steps include rolling out some of our Varnish caching servers, after a stability and performance assessment.
  • Media Storage [?] — The SwiftMedia extension developed by Russ Nelson now supports all the major media features such as download, upload, re-upload, revert, delete, and restore. Upcoming work includes unit tests and performing end-to-end tests.
  • HTTPS & IPv6 — HTTPS was enabled on a private production wiki and testwiki to test functionality and uncover bugs. Protocol-relative URLs (which will be a major feature of MediaWiki 1.18) were enabled on testwiki for community testing before rolling out to all projects (read more).

Testing environment

  • Virtualization test cluster [?] — This project was slowed down in favor of deploying HTTPS. Some work was done to move the puppet configuration into a public repository.

Backups and data archives

  • Data Dumps [?] — The June and July runs of the English Wikipedia dump were completed, and the August run is underway; possible explanations for the resolution of issues include different NFS mounting options, and fine-tuning the number of concurrent jobs. Chinese Wikipedia dumps have also been fixed. Upcoming work is focusing on checkpoint files of history dumps, to break out in-progress dumps into chunks.

Features Engineering

Editing tools

Content Quality and Editorial Tools

  • Article feedback [?]Roan Kattouw completed the UDP logger (for clicktracking metrics) and deployed it to production. The Article feedback feature was incrementally rolled out to all articles on the English Wikipedia, and the Product research team continued to analyze its impact (read more).

Participation and editor retention

  • WikiLove [?] — The code was completed, and the feature deployed to the English Wikipedia at the end of June. The Product research team published a basic analysis of its usage, and stories of its evolving usage and impact. This project is now considered to be completed.
  • MoodBar [?] — The code was completed and deployed to the English Wikipedia. The research team is now analyzing its impact.
  • GlobalProfile (formerly “StructuredProfile”) [?]Brandon Harris continued to engage in discussions with users to collect feedback and assemble requirements. The feature was renamed to “GlobalProfile” as it is now intended to work consistently across all wikis.
  • LiquidThreads 3.0 [?] — This project was mostly on hold in July, in favor of the MoodBar feature.

Multimedia Tools

  • UploadWizard [?]Ian Baker joined the team and started to work on the UploadStash back-end. Jeroen De Dauw started to extend the UploadWizard code base to support customized campaigns, like the Wiki Loves Monuments contest. Neil Kandalgaonkar refactored some libraries to better support Ian and Jeroen’s work, and committed some fixes to reduce categorization and licensing mistakes.

MediaWiki infrastructure

  • ResourceLoader [?]Roan Kattouw and Timo Tijhof started to work on global gadgets and a gadget manager. The back-end for loading gadgets remotely from another wiki now works, although it is limited to database loading within the same server farm; an API back-end is in the works. A Gadgets inventory is now also available, with plans to add actions like creation, modification, deletion of gadgets.

Wikimedia Labs

  • TimedMediaHandler [?]Michael Dale continued to address comments from code review, and participated in a Multimedia sprint planning meeting. He also started to plan the final review and possible deployment of TimedMediaHandler around September.

Mobile

  • Mobile Research [?]Parul Vora and Mani Pande continued to plan the US mobile research, to talk to possible firms, and to draft the mobile survey. Reports and syntheses from the India and Brazil field research were delayed in favor of the US research planning.
  • MobileFrontend [?]Patrick Reilly focused on proper caching support, as well as device detection optimization. Mobile device recognition on Wikimedia sites is now done server-side at the squid level, which results in faster redirect for mobile users, and better recognition of devices. A message and feedback page were set up to report false positives.

Special projects

Fundraising support

  • 2011 Fundraiser [?]Ryan Kaldari modified CentralNotice to allow the logging of changes to banners and campaigns, and has begun working on a log filter. Katie Horn fixed an issue with the PayflowPro Pending Processor script (which handles determining whether or not credit card donations flagged as ‘pending’ have been approved or not). She’s also added unit tests to new and existing code. Our server was successfully puppetized and upgraded by Peter Youngmeister and Arthur Richards. Arthur also set up advanced monitoring through Ganglia.

Offline

  • Wikipedia version tools [?]GSoC student Yuvaraj Pandian continued to port User:CBM‘s WP 1.0 bot to a MediaWiki extension, and nearly achieved feature parity with it by implementing article selection filtering based on project, quality, importance and category. Mentored by Arthur Richards, Yuvaraj also implemented the ability to save lists of filtered articles. In August, Yuvaraj will wrap up the initial development by adding the ability to manually curate article selection lists and export article lists in CSV format.
  • Kiwix UX initiative [?] — Kiwix 0.9 beta1 was released in July and included a new content manager, better search results, and fixes from our first usability study (more details in the changelog). We also refined our build system to speed up the release process.

Platform Engineering

MediaWiki Core

  • MediaWiki 1.18 [?]MediaWiki 1.18 was initially branched in May. It was re-branched in mid-July because trunk was in a better working state than the former 1.18 branch. This increased the amount of yet unreviewed commits, but will eventually save time and effort towards the deployment of 1.18 to the Wikimedia cluster, and its release to the public. A revision report was created to focus on the remaining commits to review.
  • Code review management [?] — Work continued to review commits (see chart); the re-branching of MediaWiki 1.18 aims to reduce the backlog faster. In July, Wikimedia Foundation engineering staff and contractors also attended a Code review workshop; the goal was to share experience and practices on the general review process, as well as security and performance. The accompanying documentation is now being organized.
  • Heterogeneous deployment [?]Aaron Schulz picked up work on this project, and completed most of it. Testing is scheduled to happen by early August.
The parser cache hit ratio increased from 30% to 80% with the MySQL-based parser cache.
The parser cache hit ratio increased from 30% to 80% with the MySQL-based parser cache.
  • Disk-backed object cache [?] — To improve the MySQL-based version of this system, Domas Mituzas suggested to split the cache into several tables, which Tim Starling implemented in MediaWiki. The system was then deployed on July 11th and the cache has been filling up since then, thus increasing the parser cache hit ratio from about 30% to 80%. Possible future steps include adding previous page revisions to the cache.
  • API maintenance [?]Sam Reed continued to fix bugs and to add new features to the MediaWiki API. Sam’s API work in July focused on providing the API component to the new Report Card project.
  • Shell requests [?]Sam Reed took over maintenance of shell requests. He added a new “ops” keyword to differentiate between requests that require shell access (which he can process), and other requests that can only be processed by someone with root access (“ops”). As of July 26, there were only 69 remaining shell requests, and that number keeps decreasing.
  • Continuous integration [?] — This project aims to rebuild the Wikimedia continuous integration legacy server (currently hosted on a virtual machine) on a dedicated server in eqiad, our new data center. Chad Horohoe started to consolidate the platform to run automated tests systematically at post-commit time, to check that the SVN trunk is in an (almost) constantly deployable state. This project also relates to the will to have more frequent code deployments, as continuous integration will give us more confidence in new code if it already passed the automated tests. The new server will be combined with TestSwarm, a distributed continuous integration tool for JavaScript, currently hosted on the Toolserver. Timo Tijhof reached out to the TestSwarm team, who were enthusiastic about incorporating our improvements, notably on performance.
  • Projects on hold — The HipHop deployment, AcademicAccess, App-level monitoring and Configuration management projects were mostly on hold in July.

Wikimedia analytics

  • Wikimedia Report Card 2.0 [?] — The team started their second sprint in July, whose goal was to incorporate key metrics into the Report card such as editors by geography, page views (both mobile and non-mobile) and gender breakdown of editors. Nimish Gautam worked on the infrastructure and analytics for editor by geography. Sam Reed implemented a generic CSV importer, and looked at how to use the Google API to automatically draw data about offline usage into the Report card from Google Spreadsheets.

Technical Liaison; Developer Relations


This article was written collaboratively by Wikimedia engineers and managers. See full revision history. A wiki version is also available.

Archive notice: This is an archived post from blog.wikimedia.org, which operated under different editorial and content guidelines than Diff.

Can you help us translate this article?

In order for this article to reach as many people as possible we would like your help. Can you translate this article to get the message out?