Wikimedia engineering June 2012 report

Major news in June include:

Events

Recent events

Berlin hackathon (1–3 June 2012, Berlin, Germany)

Approximately 104 participants from 30 countries came to Berlin, including MediaWiki developers, Toolserver users, systems administrators, bot writers and maintainers, Gadget creators, and other Wikimedia technologists. The community also learned more about the Wikidata and RENDER projects. More updates, links to videos, and followups are on the talk page.

Upcoming events

Pre-Wikimania hackathon (10–11 July 2012, Washington, D.C., USA)

Open source teaching nonprofit OpenHatch will be aiding in organizing and running this two-day event, with Katie Filbert, Gregory Varnum, and Sumana Harihareswara. Experienced Wikimedia technologists will collaborate on their own projects, while interested new developers will be able to learn introductory MediaWiki development. Accessibility will be one of the event themes. The event is free to attend even for those not attending Wikimania itself.

Personnel

Work with us

Are you looking to work for Wikimedia? We have a lot of hiring coming up, and we really love talking to active community members about these roles.

Announcements

Operations

Site infrastructure

June was another busy month for racking, stacking and provisioning of newly purchased equipment for Chris and Rob. In the works are additional servers to clusters such as External Store, Memcached, Parser Cache, Object Store and Labs. Meantime, new servers were rolled out in EQIAD for analytics, DNS resolver, and UDP2Log. Servers and firewalls were racked and cabled for the new EQIAD payments cluster. Storage3’s RAID controller failure was repaired, and a replacement machine was ordered.
IPV6 Launch day (6/6/12) came and went without much fanfare. Much work was put into the infrastructure and system-stack by Mark, Faidon, Ryan and Asher, especially into LVS, PyBal, Varnish, Squid, DNS, database, Nagios monitoring and puppetization. We also took this opportunity to update those technologies as well as run them on Precise (12.04) where possible. We have been keeping IPV6 traffic on since. As part of risk mitigation, only half of the LVS and Pybal servers were upgraded to run IPV6 and the enhanced features, allowing us to fallback if needed. Since we have now one month of stability, we will soon begin the rest of the migration.
During the Berlin Hackathon, the TechOps team got together for about 2 hours to review the year’s progress. A blog post on this will follow soon. In summary, the team completed 19 priority 1 projects (e.g., deploy Mobile, SSL, Labs, Db upgrades & Network redundancy) that were identified at the beginning of the year. We followed up with a list of high priority projects for this new fiscal year. A blog post with more details on this will also follow soon. In addition to working on IPv6-related work, the team did a major cleanup of jobs creating cronspam, making the logfiles more readable.
Asher performed benchmark testing on the External Store, comparing the current ISAM engine with InnoDB. He dispelled the myth that MyISAM is faster for external store for this use case. He has started migrating them to use InnoDB engine with this new information. You can read his report here.

Data Centers

We have identified a new colocation facility to be the new West Coast caching center, and it is located at 200 Paul Street, San Francisco. Work on building up the infrastructure is planned to begin this coming August/September. With this caching center, we will be able to improve users’ site experience for US west coast and Asia Pacific.

Object Store/Swift

A severe bottleneck has been identified in doing container listings in Swift and Ben Hartshorne is adding SSD drives to the swift back end storage nodes to provide faster container listings. Testing has been completed to verify that this change will solve the problem and it is being deployed to production this month. Additionally, integration of the SwiftStack monitoring improvements was accepted to the mainline Swift codebase last month and will be deployed to our environment in July.

Testing environment

Wikimedia Labs

The Labs infrastructure had a DNS outage, caused by glue records that must be updated via a manual process. To combat that issue in the future, Labs DNS resolvers are now on service IPs with service host names. A DNS resolver was brought up in EQIAD, as well as an additional LDAP replica. Faidon’s puppetmaster::self class is being put into use. It’s working well enough that the test branch for puppet was merged into the production branch, and Labs now runs directly off of the production branch. The very annoying “No nova credentials for your account” bug has been fixed. virt6-12 in pmtpa have been racked, wired and installed. They will soon be put into production. Andrew Bogott’s work on the nova plugin framework continued this month. The plugin framework has been moved into openstack-common, making it the plugin framework for all openstack services. Work is now ongoing to merge the changes back into nova. Per-project Debian repositories (for ubuntu-precise and above) are now available. An all-in-one MediaWiki puppet class is now available as well.

Backups and data archives

Data Dumps

Media downloads per project are now live, along with one or two “incremental” downloads per month. The new deployment system (which actually uses scripts instead of moving files around by hand) was completed and is in place. It was even used this month to push some minor changes. We’re working with another organization that wants to mirror media, and we’re still looking for more mirror sites for media, dumps or pageview stats; send us ideas! The archive.org uploader code was rewritten as a core S3 uploader library with archive.org extensions and new features we need are being added; this will be extended for Google Storage usage as well.

 

Other news

  • We had our fair share of several short site incidents in the month of June. On June 7, users reported experiencing API service slowness and unavailability. Tim was around to resolve that incident (detailed report). On June 20 (and also on June 21), users reported about getting Apache HTTP timeout issue. It was found that in both cases, one of the memcached servers was experiencing high load and restarting them resolved the issue (detailed report). The incident on June 19 did not impact our MediaWiki production clusters, though it caused our email system to be held up for half a day.
Jeff discovered a distributed spam attack on our mailsystem involving what appeared to be a few thousand malicious hosts. They were flooding our secondary mailserver with undeliverable messages to fake addresses at various WMF domains. The secondary mailserver forwarded those to the primary mailserver, which overloaded and became slow in processing legitimate mail. A temporary fix was put in place to drop those fake and spam messages, but it took a day for the mail system to catch up. We subsequently put a proper fix in place.

Features Engineering

Editing tools

Visual editor

The team did the first deployment of VisualEditor and Parsoid, with an early version now live in a test namespace on mediawiki.org. This editor is broadly feature-compatible with the old, EditableSurface-style code which this replaces, while being the first release that can create and edit pages. The team is now planning to deploy new code as it develops every two weeks or so. The initial push will be to work on bug-fixes, and to finalise the code for a few features that were close to being ready before the first deployment.

Editor engagement

Article feedback

Fabrice Florin worked with new WMF engineer Matthias Mullie and OmniTI to develop a range of new features for version 5 of the Article Feedback Tool (AFT5). This month, the team completed primary feature development for this tool, including the article feedback page, the central feedback page, and the final feedback form (scroll to bottom of page). We started writing and publishing new documentation about this project, including this help page. Dario Taraborelli, Aaron Halfaker and Oliver Keyes published a full report that suggests that people who post feedback are more likely to edit articles afterwards. Roan Kattouw continued to review our code and trained our team to start deploying code on their own. We have started a wider deployment of AFT, which will gradually increase our coverage to 10% of the English encyclopedia by the end of July, with full deployment a couple months later.

Page Triage

Ryan Kaldari, Benny Situ, Fabrice Florin, Oliver Keyes, Brandon Harris, Vibha Bamba and Howie Fung deployed an updated version of the New Pages Feed (formerly called Page Triage) on the English Wikipedia. This new tool provides an enhanced list of pages for review by community patrollers. The team also deployed the first version of a new curation toolbar to appear on article pages, enabling patrollers to get more article info, mark pages as reviewed, or tag them. We plan to complete development of the full curation toolbar this month (including tools to nominate articles for deletion and send WikiLove to page creators), then start integrating it with the Article Creation landing system. Check out the current prototype on the English Wikipedia, as well as the latest version on Wikimedia Labs. (Tech tip: if you are an auto-confirmed editor, click “Review” on any unreviewed article shown in red on New Pages Feed and add “?curationtoolbar=true” to the URL.) Please report any bugs on Bugzilla.

Multimedia Tools

TimedMediaHandler

Development on TimedMediaHandler has been put on pause until Jan Gerber comes into San Francisco late July for the final push.

MediaWiki infrastructure

ResourceLoader

The demo with the latest version is deployed on WMF Labswhere there’s a cluster of 4 wikis connected with a shared Gadget repository.Recently implemented:

  • Finished back-end validation of gadget definitions when saving. Users now get a descriptive error and the edit will not be saved.
  • Roan implemented a view for Gadget definitions where the JSON syntax is prettified with indention etc.
  • Timo is currently going through a review backlog in the RL2 branch, and working on front-end implementation of the new “skins” and “position” properties in the (visual) gadget defininition editor.
  • Assorted other progress on the implementation of the specification, and task list of small bug fixes and improvements.

Feature support

Wikipedia Education Program

Jeroen De Dauw and Sam Reed finished review. Extension has been deployed, but temporarily disabled again due to a namespace/title conflict with a Star Trek: Voyager episode (“Course: Oblivion”!). This should be resolved shortly.

2012 Wikimedia fundraiser

Onboarded Adam Wight to the team. GlobalCollect recurring is now code complete (now in code review). Integrated with Yandex through GlobalCollect. Finished migration of payments deployment to Git.

Internationalization and Editor Engagement Experiments

Internationalization and localization tools: In June, the team:

  • Completed initial UI design and user experience testing for the Universal Language Selector (ULS)
  • Developing initial prototype for the Universal Language Selector (ULS)
  • Developed and deployed Translation Notifications
  • Added more language input methods to Narayam
  • Added more language script fonts to Web Fonts
  • Made progress on integration of Translate functionality on meta for communications and fundraising groups (with integration into CentralNotice)
  • Started work with Arabic community to increase Arabic language support into i18n/L10n tools

Editor engagement experiments: The team redeployed the Timestamp Position Modification experiment and it is now wrapped and in analysis. Designs and analytics work on the next experiment, post-edit feedback, were completed in preparation for a July deployment. Debug hooks were added to the clicktracking extension with the goal of improving QA for experiments. We wrote a clicktracking dashboard that intercepts event logging calls and displays them on-screen, shows which experiments are currently active, and to which bucket (if any) the current user has been assigned. Work is ongoing on a re-write of the clicktracking extension, which is taking shape as at Extension:E3_Experiments.

Mobile

Contributors

Mobile Contact us

Phil, Tomasz, and Arthur worked together to test out several new contact methods for mobile users. They routed general contact emails to OTRS and technical emails are going directly to the engineering team. With some minor changes a more permanent email address will be set up for technical problems.

Wiki Loves Monuments App

The WLM app team (Yuvi, Phil, Elke, Lindsey & Jon) worked closely during the month of June to finalize requirements, draft workflows, design mockups, and begin implementation on the first version of the app. The team did its first showcase of the app and are working quickly to resolve any outstanding bugs.

Readers

Mobile Nav

The mobile nav continued its progression through the month of June. Jon, Phil and Lindsey worked together to get us closer to the new mobile experience. Language functionality got into an initial form, ready for deployment to the beta site, and the team has nearly settled the Article Action approach, along with an initial version of hiding and revealing the search bar. A relatively complete implementation of the new site UI will be pushed to the Beta site soon.

Wikimedia Apps

The app team (Yuvi) spent the month of June polishing off the Wikipedia app for Android (Vers. 1.2) and iOS (Vers. 3.2). Android is available in the market immediately while iOS is still under Apple review. The iOS version is picking up the latest PhoneGap 1.7 changes and will see a dramatic speed improvement.

Wikipedia Zero

Dan and Patrick continued conducting tests with Orange in six different countries. Additional testing and refinement is under way with our partners in Bangladesh and Montenegro.

J2ME App

Tomasz worked with Legal and our new contractor OpenPath to close the initial work agreement. OpenPath has kicked off their development and we should see their first check in shortly. Initial screen flows can be found here. Kul & Phil worked to settle our initial device test list. Patrick will be providing any necessary technical assistance.

Wikipedia over SMS & USSD

We’ve been making significant progress recently to secure the necessary partnerships that will make it possible to provide Wikipedia over SMS.

Infrastructure

Mobile default for sibling projects

Phil, Patrick, Max, and Asher worked together to prep the sibling projects to become mobile default. Phil used the global messaging to notify all affected village pumps about the change. Patrick, Max, and Asher refined our squid mobile redirector to allow for new projects and enabled it for Wiktionary, Wikinews, and Wikisource. All three of these projects default to mobile now if we detect a mobile phone. We’ve scheduled our next set of projects on the project timeline.

Improved Mobile Device Detection

Diederik van Liere and Patrick Reilly continued our work on integration with the Apache Device Map project.

Platform Engineering

MediaWiki Core

MediaWiki 1.20

We successfully completed the MediaWiki 1.20wmf4 and MediaWiki 1.20wmf5 deployments in June, and started the MediaWiki 1.20wmf6 deployment process.

Git conversion

This has been a very busy month for Gerrit. The creation of new projects continues; this month saw every extension deployed on Translatewiki moved to Git. During the week of June 25th, we experienced some downtime with Gerrit due to search engine crawlers overloading the server. Also that week, lots of improvements to IRC logging were made, although discussion continues on how to make the bots more effective. We have scheduled an upgrade of Gerrit to the 2.4 release for the week of July 2nd–this will bring the much desired “Rebase button” to Gerrit which should lighten users’ workload for trivial merges.

SwiftMedia

Ben Hartshorne is installing SSDs for use in storing the object listing database, in hopes that having faster storage will result in faster purge times (fixing bug 34717), which we hoped to complete in June, but which is stretching into July. All work for deploying Swift for storage of original images is on hold until we fix the object listing performance problems.

Lua scripting

Tim Starling led tutorial sessions in June and videos (first session, second session) are now available on Vimeo. They will be on Wikimedia Commons by mid-July. Ross Andrews is now working on documentation in the form of help/tutorial pages, especially describing the MediaWiki interface. Once that’s done, Tim will promote the prototyping site on Labs more heavily, and at some point after that, we will install the Scribunto extension on mediawiki.org. Work on Lua was paused in late June to catch up on other activities. Full deployment to Wikimedia sites is scheduled for 2013.

OAuth

This project is not yet in highly active development, but work has started on a more complete set of requirements and user stories. Chris Steipp put out a call for user stories that were prioritized. We determined that most stories fell into two categories: Implicit grant (a.k.a. “user agent flow”), and Server-side flow. We’ll likely need up-front registration for people implementing OAuth, so Chris will propose a workflow for what that looks like. Comments are welcome on the wikitech-l mailing list thread about OAuth, and/or the OAuth talk page.

Code review management

Diederik van Liere is gathering Gerrit stats now, and is planning to publish the first batch soon. In the meantime, current statistics on all MediaWiki (core and extensions):

  • 49 that have received a positive tentative review (+1) but have not been merged (+2)
  • 203 that received neither -2, -1, +1, nor +2 reviews (but might have textual comments)
  • 61 received a negative tentative review (-1) with issue to be addressed by the original contributor
  • 15 that have been rejected (-2) but not yet abandoned by their original authors

Security auditing and response

Chris Steipp was on leave for much of June. Work continues to audit of global JavaScript and CSS across Wikimedia sites. Three security issues opened, two closed. Secure code review training given at Berlin Hackathon.

Site performance

An initial investigation has begun on the possibility of upgrading from PHP 5.3 to PHP 5.4. Benchmarks are very promising, but a security enhancement we are currently using with PHP 5.3 (Suhosin) is not yet available for PHP 5.4, so the team is debating whether to carry on without it, as well as estimating the performance penalty introduced by this patch. More improvements have been made to Ganglia and Graphite.

Quality assurance

QA and testing

This month saw a big focus on hiring the QA Engineer and Volunteer QA Coordinator. We also continued to be focused on testing Article Feedback (including via an event on IRC with OpenHatch and new testing volunteers), and are working to get beta labs fit for use as a test environment for AFT and Editor Engagement (E2).

Beta cluster

The primary focus of Beta cluster work in June was in service to TimedMediaHandler (TMH). TMH has been setup though transcoding is not operational yet, since that would require a fully functional job queue. The team discovered that the version of Ubuntu currently used in production (Lucid) won’t work with TimedMedia Handler. As a result, Antoine and Faidon updated the Puppet configurations for the Apache web servers to run on the next generation Ubuntu (Precise). Administrative tools have been setup closely following the way it is done in production. For example, the Beta Cluster now uses the exact same workflow to update the l10n cache as we do in production. The team plans to further improve this by fetching l10n updates from translatewiki.

Continuous integration

Timo Tijhof is working on setting up the new TestSwarm in Wikimedia Labs. We will use the TestSwarm and BrowserStack API through the testswarm-browserstack bridge to automatically populate the swarm with needed browsers. Antoine Musso upgraded Jenkins to the latest version, 1.472.

Wikimedia analytics

Report card

Erik Zachte and Fabian Kaelin prepared the June Reportcard for the monthly Metrics Meeting. David Schoonover worked on adding d3.js support to Limn and is preparing to publish the project’s source code on github.com.

Page view logging

A change to add 2 new headers to logging fields has been submitted. We are waiting on the go ahead from consumers to merge and deploy this.

Kraken (Analytics Cluster)

Andrew Otto has performed several preliminary benchmarks on a 10 node CDH3 cluster. We plan to do more benchmarking with CDH 4 and Datastax Enterprise. Focused has recently switched to building, testing and deploying Facebook’s scribe as an eventual replacement for udp2log. We are investigating the use of scribe initially for Lucene search query logging.

Engineering Community

Bug management

The Wikimedia Foundation is seeking a Bug Wrangler to work on management of bugs.

Summer of Code 2012

The Google Summer of Code students continue their twelve weeks of design and coding.

Engineering project documentation

At the Berlin Hackathon, Guillaume Paumier, Rob Lanphier and Timo Tijhof discussed how to summon the Status Helper tool from custom edit links. Guillaume modified the templates to provide hidden metadata, and Rob implemented the functionality in the JavaScript. Timo also converted the user script into a full-fledged opt-in gadget. Rob Moen created a JavaScript tool to easily assess and tag pages, as part of an effort to clean up the wikitech wiki, before merging it with labsconsole.

Volunteer coordination and outreach

Sumana Harihareswara continued to follow up on contacts, recruit new contributors to the Wikimedia tech community, and mentor new contributors. She granted developer access and Gerrit project ownership requests, and planned upcoming events. The Foundation is also hiring a coordinator for volunteer testers and an engineering outreach coordinator to work on volunteer coordination and outreach.

Wikimedia Foundation engineering 20% policy

Sumana Harihareswara is coordinating WMF engineers’ efforts to spend 20% of their work time on code review and other efforts benefiting the entire Wikimedia engineering community. Their highest priority is the Gerrit merge queue, especially for backlogged components such as UploadWizard and ProofreadPage, and secondarily patches awaiting review in Bugzilla for MediaWiki or WMF-deployed extensions. Some participants are instead concentrating on bug triage, documentation, and the extensions awaiting review for deployment.

Wikidata

The Wikidata project is funded and executed by Wikimedia Deutschland.

The team published an easier-to-understand version of their data model, updated their story boards for how to link between Wikipedias in the future, and submitted a proposal to the Knight News Challenge to make Wikidata a central, persistent repository for identifiers on the web in a second year of development. Also, proposed logos went up for public voting.

Future

The engineering management team continues to update the Software deployments page weekly, providing up-to-date information on the upcoming deployments to Wikimedia sites, as well as the engineering roadmap, listing ongoing and future Wikimedia engineering efforts.


This article was written collaboratively by Wikimedia engineers and managers. See revision history and associated status pages. A wiki version is also available.

9 Show

9 Comments on Wikimedia engineering June 2012 report

Guillaume Paumier 2 years

Waldir: I’m sure we can add color coding for the next report (i.e. August report; the July report was just published). Collapsing sections is a tempting idea, but I’m not sure it would be much more readable than scanning through the doc and looking at the (prominent) activity names. Let’s try out colors and see.

Waldir 2 years

Hmm, at least you could add a little javascript to collapse sections so we could expand read individual parts more comfortably. Collapsing here would mean showing only the sub-headlines inside the section, not hiding everything entirely (this way the fully collapsed report would be like a table of contents). Alternatively, you could add, say, background colors to different sections. Or use (subtle) colors for the headers of different sections (reddish for operations, greenish for engineering, etc).

I’m saying this because there are so many section headings and subheadings that it might be hard to distinguish different levels visually/quickly and thus see where a section ends and another starts (especially when a full section is taller than a screenful)

Personally, I’d prefer the collapsing thing, but color-coding would be a good adition for those reading by RSS.

Guillaume Paumier 2 years

About the length of the report: I agree that, as the Foundation has grown, so has the amount of work done by Wikimedia engineers, and the report has grown very long.

I’m not sure what the best way is to solve the issue. I actually asked that question during the “Transparency and collaboration in Wikimedia engineering” session at Wikimania, and the answer I got was that its length wasn’t really an issue, as very few people read the report in its entirety. Most readers scan the report for the bits and projects that interest them, and ignore the rest.

Assembling and publishing the report takes a considerable amount of time, so I’m not sure it’s possible to publish reports more often. What I usually do is try to keep each paragraph short, and link to the activity page for more information, but I’m reluctant to summarizing too much. I’m open to other ideas, though.

Harry Burt 2 years

> Still, if you’d like my personal opinion on the Technology Report, I think what I
> miss the most is follow-up to some entries such as discussions in the technical
> mailing lists that are mentioned in one week and then don’t get any coverage
> afterwards. Maybe a new “follow-up” section would help people to stay informed about
> the outcome of such discussions, and perhaps could even sparkle some revivals.

A great idea :) I’ll try to bear it in mind when preparing future reports.

Waldir 2 years

Actually, I think the Signpost’s Tech Report has been quite comprehensive lately. Including more things in it could make it too long, compared to the other sections of the Signpost. On the other hand, I believe a report such as this one is important and shouldn’t be entirely delegated to the Signpost (especially since many things would probably not fit well in an independent publication such as the Signpost — for example, the staff changes and job openings at the WMF).

The problem is that this monthtly report is just too long for a comfortable read, and splitting it in smaller chunks seems to be a good way to alleviate that, without reducing the amount of information it covers.

Still, if you’d like my personal opinion on the Technology Report, I think what I miss the most is follow-up to some entries such as discussions in the technical mailing lists that are mentioned in one week and then don’t get any coverage afterwards. Maybe a new “follow-up” section would help people to stay informed about the outcome of such discussions, and perhaps could even sparkle some revivals. A recent example is the MediaWiki logo discussion whose thread was mentioned in the Signpost a few editions ago.

Harry Burt 2 years

Hey Waldir. I’m the guy who writes the Technology Report, so I’m naturally interested if there’s things you would like to have more regular updates for (e.g. on a weekly basis) given that many of the smaller projects do indeed only receive coverage once a month. What in particular do you miss? Or just everything?

Waldir 2 years

Thanks for the fix and the suggestion. I regularly read the Signpost, but a lot of what is included in these reports doesn’t go there. These differences are exactly what makes people brace theirselves and go through these huge reports, if they don’t want to miss what’s going on in the “Wikimediaverse”, technically speaking. But I would understand if the current setup for producing these reports doesn’t lend itself easily to a shorter timeframe format. Is that the case? Otherwise, I believe there are good reasons to try a different size and publication frequency.

Erik Moeller 2 years

Thanks Waldir! Fixed those issues.

A good weekly update is the technology report in the Wikipedia Signpost, e.g.

https://en.wikipedia.org/wiki/Wikipedia:Wikipedia_Signpost/2012-07-09/Technology_report

Waldir 2 years

A few notes:

1) There’s a duplicate entry for “Adam Wight” in the Personnel announcements section
2) “those technology” sounds a little off to me, but maybe it’s because I’m not a native speaker of English. Can you check?
3) “a mail distributed mail spam”?
4) “a security enhancement we currently using” — verb missing?
5) These reports are sometimes quite long. Any chance to publish them more often, with less content?

Leave a Reply

Your email address will not be published. Required fields are marked *