Wikimedia blog

News from the Wikimedia Foundation and about the Wikimedia movement

Wikimedia engineering July 2012 report

Major news in July include:

  • Engineering presence at Wikimania 2012 in Washington, D.C., and the pre-Wikimania hackathon;
  • the launch of Limn, an open source dataviz toolkit developed by the Wikimedia analytics team;
  • the deployment of Article Feedback Version 5 (which supports free-text feedback and moderation thereof) to 10% of English Wikipedia articles

Engineering metrics in July:

Events

Recent events

Wikimania and Pre-Wikimania hackathon (10–15 July 2012, Washington, D.C., USA)

This year’s pre-Wikimania Hackathon was special in that it had a full track for newcomers, going beyond tutorials. The Hackathon was a collaboration with OpenHatch, an open-source teaching non-profit. The new efforts included appropriate first-time tasks to orient newcomers into more advanced Wikipedia editing and tech contribution, a laptop setup guide that steps attendees through the process of configuring development environments, and constant in-person assistance to help people past problems they encountered. While at the event, we saw many people learning more about templates, editing Wikipedia, and using and modifying bots to improve the encyclopedia and media on it. At least 65 people signed in, with surely more in attendance. During the main Wikimania conference, a number of volunteer and staff gave talks and led discussions about technology-related topics.

Upcoming events

Wikipedia Engineering Meetup (15 August 2012, San Francisco, USA)

The Engineering department of the Wikimedia Foundation has initiated a Wikipedia Engineering Meetup to showcase the interesting problems and products they work on to the local developer community. Tentatively, the meetup will happen every two months at the Wikimedia offices in San Francisco, and will consist of three 15-minute engineering presentations, followed by a question & answer period bracketed by mingling. The inaugural meetup will feature talks about Mobile engineering, Analytics and the VisualEditor.

Personnel

Work with us

Are you looking to work for Wikimedia? We have a lot of hiring coming up, and we really love talking to active community members about these roles.

Announcements

Operations

Site infrastructure

July was a relatively quiet month for Operations, and the team was working mostly behind the scenes. Mark Bergsma has successfully integrated and tested the upgraded Varnish software (with persistent cache patch) on some of our mobile caching servers. They are working very well and the plan is to roll it widely in the coming weeks.
Mark also made several feature upgrades to LVS/Pybal, including IPv6 BGP support, and a DNS recursor implemented as an LVS cluster and on the back of Ubuntu 12.4. The package has been puppetized and deployed. It also means EQIAD servers no longer hop to Tampa to get DNS answers.
Peter Youngmeister has been packaging, puppetizing and testing the new application server build. This build runs on Precise (Ubuntu 12.04) and works with the Swift object store (rather than the current NFS filer). There will be further performance and scalability tests across bigger portion of Tampa application servers shortly.
Asher has completed and tested his latest (Precise) MySQL build on one of the database slaves. This will serve as the package for future MySQL upgrades and new deployments going forward.

Object Store/Swift

Migration to Swift is progressing to the final stages now that the performance bottleneck issue identified in June has been resolved. MediaWiki is now operating fully using Swift as the primary object store for thumbnails (the NFS filer is relegated to a secondary fail-over backup). The ‘originals’ (uploaded images and multimedia contents) have been copied over to Swift as well, setting the stage to migrate away from the NFS filer next month. Also coming next month: upgrade to Swift version 1.5.0 and bring online a second Swift cluster in eqiad.

Wikimedia Labs

This month was focused on adding new hardware, working on upgrading OpenStack infrastructure, and other stability efforts. virt6–8 have been added to the cluster and about 20 instances have been migrated to these nodes so far. Another 40 instances have been created on virt6–8 since the addition. Initial instance migration efforts ended in 30 instances being corrupted due to a KVM block migration bug. A cold migration process was created as a workaround, with an automated script. Development effort is ongoing to upgrade OpenStackManager to support the Essex release of OpenStack. Keystone support is complete and OpenStack API support is being added currently. Development work on OpenStack continues as well. Andrew Bogott’s openstack-common plugin framework has been merged. novaclient work is progressing and should be merged after some cleanup efforts. Some changes needed for OpenStack Keystone’s LDAP backend to work for the essex (stable) release were pushed in collaboratively between Ryan Lane and OpenStack developer Adam Young. A blueprint has been submitted for using Keystone to manage LDAP entries via templates, so that we can move to Keystone as an LDAP manager in the future. Work has begun on using OpenStack Nova for managing DNS entries. GlusterFS project storage has been upgraded to version 3.3. A tutorial on Using Puppet with Labs was hosted at the pre-Wikimania Hackathon by Leslie Carr and Ryan Lane, and a presentation on Labs and the State of Our Open Source Infrastructure was given at Wikimania.

Data Dumps

The YAS3 library for uploading to archive.org and to other s3-compatible sites, along with several command line clients, is now usable (though still under heavy development). This library handles 100 Continue correctly; this means that for large file uploads, the upload is only attempted once the client has been redirected to the right host, a great time saver. The library also supports uploads of large files in multiple chunks automatically, rather than requiring the user to split the file into separate pieces. That’s a necessity for us since many of our dump files are quite large.

Features Engineering

Editing tools

VisualEditor [edit]

The VisualEditor (VE) team presented their work at Wikimania and received a good deal of feedback from the community. The team created a rough plan for the next three months’ work. The early version of VE on mediawiki.org was updated twice, fixing a number of bugs and noticeably including the addition of support for nested lists. Gabriel Wicke relocated to San Francisco, and Timo Tijhof visited the SF office for three weeks after Wikimania.

Editor engagement

Article feedback [edit]

Fabrice Florin and Matthias Mullie led the deployment of Article Feedback on 10% of the English Encyclopedia, in collaboration with Pau Giner, Ryan Kaldari, Roan Kattouw, Oliver Keyes, Chris McMahon, Benny Situ, Heather Walls, Howie Fung and Terry Chay. This month, the team developed final features for this tool, including the article feedback page, the central feedback page, and the final feedback form (scroll to bottom of page). To guide users of this tool, we also published a new video tour, a walkthrough tutorial and various help pages. We have received a very positive response to article feedback from the Wikipedia community through a variety of channels, from talk pages to IRC chats and Wikimania presentations. Community members typically find the tool useful and well-thought out, and many editors have told us they have already made improvements to articles based on feedback from readers — which is exactly the behavior we were hoping to encourage. We have started our productization phase (more platforms, scalability, code re-factoring, localization, metrics, mobile). We are now aiming for a full release to 100% of English Wikipedia by October 2012 — with other wiki projects starting later this year.

Page Curation [edit]

Ryan Kaldari, Benny Situ, Fabrice Florin, Oliver Keyes, Brandon Harris, Vibha Bamba, Terry Chay and Howie Fung deployed an updated version of the new Page Curation product (formerly called Page Triage) on the English Wikipedia. This new product includes two main features: 1) the New Pages Feed, a dynamic list of new pages for review by community patrollers; and 2) the Curation Toolbar, an optional panel on article pages, which enables editors to get page info, mark a page as reviewed, tag it, mark it for deletion, send WikiLove to page creators — or jump to the next page on the list. This month, we completed development of all key curation tools and are now adding a couple final features (such as the ability to send a personal note to page creators, to give them helpful tips and let them know their page has been reviewed, tagged or nominated for deletion). We now plan to pre-release Page Creation on the English Wikipedia in mid-August — with a full release in September 2012. Check out the current beta version on the English Wikipedia, as well as the latest version on Wikimedia Labs. (Tech tip: if you are an auto-confirmed editor, click “Review” on any unreviewed article shown in red on the New Pages Feed; until the product is pre-released, please add “?curationtoolbar=true” to the URL, in order to see the Curation Toolbar.) Please report any bugs on Bugzilla.

Notifications [edit]

Andrew Garrett continued to build this feature and plans on deploying test Echo prototype on mediawiki.org in early August. It will not include infrastructure parts that depend on JobQueue services. Right now it only supports notifications for talk and LiquidThreads.

Multimedia Tools

TimedMediaHandler [edit]

Jan Gerber is in San Francisco, currently working on fixing up the transcoding, and ensuring interoperability with SwiftMedia, working with Michael Dale. On July 31, Aaron Schulz, Jan, and Michael deployed the TimedMediaHandler extension to test2.

MediaWiki infrastructure

ResourceLoader [edit]

The Labs back-end is complete on Wikimedia Labs. Timo Tijhof is currently debugging and testing it.

Feature support

Wikipedia Education Program [edit]

The extension is still disabled, pending resolution of namespace issues. The Education Program team has been presenting at various conferences around the world.

2012 Wikimedia fundraiser [edit]

We onboarded Matt Walker to the team. Progress was made on enhancements to CiviCRM that enable Finance and other departments to get relevant metrics and reports more easily. Katie Horn traveled to Wikimania and gave a presentation about the fundraising infrastructure.

Internationalization and Editor Engagement Experiments

Internationalization and localization tools [edit]

The team continued to work on the Universal Language Selector and set up a prototype to test it. Development was completed on Translation memory and CLDR plurals support and code review is pending. User experience testing of the Translate extension is in progress to Translate UX enhancements. The WebFonts and Narayam extensions were deployed on Bengali Wikisource and Punjabi Wikipedia. Development started on Project Milkshake.

Editor engagement experiments [edit]

The Timestamp Position Modification experiment was completed, and initial analysis shows that adding the timestamp on articles increases clicks on the History tab. Development started to deliver post-edit feedback messages; this experiment includes a proof-of-concept dry run of a new editor bucketing strategy for delivering experimental treatments that was deployed in advance of the full experiment. The team configured a test environment on Labs, to be used for UI and functional requirements validation by the team. The Wikimania conference was an opportunity to interact with editors from the English Wikipedia, and to define a new experiment related to cleanup templates.

Mobile

Contributors

Wiki Loves Monuments App [edit]

Significant progress made at the Wikimania Hackathon in Washington DC and elsewhere. The development team of Jon, Brion, Yuvi, Max, Arthur improved the robustness of photo uploads, sped up monument discovery, and polished the app extensively. We held a showcase to demo our app to the WLM community and have released a first beta for Android.

Readers

Mobile Nav [edit]

Design of the new Navigation UI has progressed to a point where the basic design can be used and tested on the mobile beta site. Work will resume when the WLM App is complete.

Wikipedia Zero [edit]

Dan and Patrick continued conducting tests with our partners in Bangladesh and Montenegro. We debugged and resolved serious issues with our Opera Mini integration and general infrastructure

J2ME App [edit]

Development continued with our partner OpenPath. We completed basic api integration, image light boxes, search, main page integration, and numerous others. OpenPath delivered two alphas builds to use for testing and were eager to move forward in to the beta cycle.

Wikipedia over SMS & USSD [edit]

Patrick, Jeremy, and Dan worked to make the vumi architecture stack production ready. They worked with the operations team to puppetize the setup in prep for moving it to real hardware. Next month we’ll be prepping for a demo with one of our potential partners.

Infrastructure

Mobile default for sister projects [edit]

Patrick and Arthur added three new projects and one new domain to be mobile default. Wikiquote, Wikibooks, Wikiversity, and *.wikimedia.org are now equipped to better server our mobile users. The development team is eager to hear back from our community about what they would like to see from their projects on mobile. Our last project to migrate will be commons after which we’ll close this project.

Offline

Kiwix

We finally released Kiwix 0.9 rc1 (see the CHANGELOG). All the binary files were compiled using our new continuous integration build platform. In collaboration with Wikimedia France (for the Afripedia project), we released a first version of kiwix-plug, a standalone WiFi hotspot using cheap plug computers. The Black&White project, contracted by Wikimedia CH, was completed; a recent achievement was the introduction of Kiwix in the official Debian package repository. Also in collaboration with Wikimedia CH, we started a new project called ZIM autobuild aiming to quickly and automatically generate ZIM files of our projects.

Platform Engineering

MediaWiki Core

MediaWiki 1.20 [edit]

Bi-weekly deployments continued through July, with the completion of the MediaWiki 1.20/wmf6 and MediaWiki 1.20/wmf7 deployments. The deployment of MediaWiki 1.20/wmf8 started on July 23 and will conclude on August 1. Bi-weekly deployments are expected to continue through August.

Git conversion [edit]

We’re in the process of evaluating Gerrit and its alternatives as a code review tool. Chad Horohoe and Ryan Lane upgraded our Gerrit instance to 2.4, which provided many incremental fixes and small features (e.g. the “rebase” button). Ryan and Asher Feldman migrated the Gerrit database to our Ashburn datacenter, which resulted in a big performance boost.

SwiftMedia [edit]

Aaron Schulz is in the process of migrating image originals into Swift. Commons is completed (barring a few minor problems to investigate in the logs), and the rest of the wikis are in various stages of progress. Meanwhile, there’s also a minor architectural change (MultiWrite backend) that will be deployed soon, which is a necessary prerequisite to serving/storing originals in Swift.

Lua scripting [edit]

Tim Starling has added a debug console to test code snippets. We believe we’re ready to deploy Lua to the WIkimedia cluster, starting with test2 in August, followed by mediawiki.org. We plan to let Lua incubate on mediawiki.org while we test the performance characteristics with key templates, and work out a deployment plan for larger wikis that includes community involvement.

OAuth [edit]

Chris Steipp continued to gather requirements, but mostly focused on other projects in July, leaving OAuth on the backburner.

Site performance [edit]

Tim Starling investigated an LLVM PHP bytecode converter this month, which looked like a promising direction for performance optimization (slides here). The theoretical gain seems pretty significant, but actual performance he was able to observe was disappointing and we probably won’t go in that direction. Asher Feldman has deployed an upgraded version of the parser cache server (db40) and the results have been impressive. Comparing 90th percentile and 99th percentile cache response times averaged over several days (July 3-5) for the parser cache server versus the last 8 hour for new improved parsercache shows 90th percentile response time dropping from 53.6ms to 7.17ms, and 99th percentile response time dropping from 185.3ms to 17.1ms. This is relevant to every page request from logged in and cookied logged out users so should have a meaningful impact on the user experience.

Incremental architectural improvements [edit]

Aaron Schulz and Andrew Garrett have been working on job queue improvements, Tim Starling on Apache configuration cleanup, and Antoine Musso on normalizing Labs and production configurations.

Database sharding [edit]

Aaron Schulz wrote a new class (ExternalRDBStore) used for sharding tables in MediaWiki, and is now in bugfixing mode. He also wrote a patch to shard some of the tables associated with FlaggedRevs as a first use of this class. Asher Feldman is currently investigating hardware requirements for utilizing sharding.

Code review management [edit]

  • +1 but not merged: 41
  • 0 but not merged: 210
  • -1 but not merged: 87
  • -2 and not merged: 15

Security auditing and response [edit]

Some audit work has resumed, and more bugfixing is needed in this area. Chris has reviewed Timed Media Handler, Signup API, and is working on a review of Wiki Loves Monuments.

Wikidata deployment [edit]

Daniel Kinzler has finished up what we hope is the final round of changes to the Wikidata branch, which Tim Starling needs to review prior to merging into master. There is currently a discussion about the best way of storing a local copy of the Wikidata-based language links (see the thread about Wikidata blockers on wikitech-l).

Admin tools development [edit]

Chris Steipp has been generally tasked with working on features related to spam blocking and other disruptive behavior blocking, with Tim Starling‘s guidance. Chris Steipp, Andrew Garrett, Tim Starling, James Forrester, and Rob Lanphier met to discuss the scope of this work, charting a broad list of feature requests, and attempted an initial prioritization of the most important items. Jack Phoenix has volunteered to act as a Product Manager to help manage this work.

Quality assurance

QA and testing [edit]

Hiring a QA Engineer remains a high priority. Article Feedback Version 5 is now on 5% of Wikipedia, with a plan in place to increase that percentage over the summer. AFTv5 is being praised highly by the Wikipedia community, although a small number of power users are experiencing a particular problem. We isolated that problem and have a potential fix in place for deployment July 24. Work on the labs beta cluster continues, with AFT and TimedMediaHandler first priorities and the Editor Engagement project to follow. No community test events are planned right now, although the groundwork is in place for community test events related to Bugzilla tickets for Extensions and to the Visual Editor.

Beta cluster [edit]

The beta cluster infrastructure is now mostly in our configuration change engine (puppet) and start being used by third parties. The Features team and Jan Gerber are now taking advantage of the beta cluster to stage change for production. We have set up Captcha and IP blocking to reduce the amount of spam being generated on the beta wikis. An overview document has been started to help introduce new people to the beta cluster.

Continuous integration [edit]

Antoine Musso automated the process of updating extension code from Git/Gerrit using Ant, for purposes of automating unit tests on extensions. The first experiment was with the Wikidata project which revealed issues with other parts of the build scripts, so this is still a work in progress. Antoine will be out for much of August, and his primary focus has been on Beta Labs, so work in this area will resume in September.

Wikimedia analytics

Report card [edit]

The Wikimedia-specific portions of the Report Card were split out so that Limn can be used by third-parties.

Limn [edit]

The Analytics team published the source code for Limn to GitHub this month.

Page view logging [edit]

Modified lucene lsearchd code to use log4j appender for udp2log rather than manually editing codebase. Also built scribe and scribe log4j appenders for sending arbitrary logs to scribe. No movement on log format changes.

Kraken (Analytics Cluster) [edit]

Researching and evaluating udp2log replacements for getting data into Kraken cluster. Document in progress here: Analytics/Distributed_Logging_Solutions.

Engineering community support

Bug management [edit]

The Wikimedia Foundation is seeking a Bug Wrangler to work on management of bugs.

Summer of Code 2012 [edit]

Mid-term update: http://thread.gmane.org/gmane.science.linguistics.wikipedia.technical/62285 Pencils-down in August

Engineering project documentation [edit]

Sumana Harihareswara, Guillaume Paumier and Rob Lanphier interacted with the rest of the technical and non-technical community at Wikimania to discuss how to improve transparency and collaboration around Wikimedia engineering activities. They notably discussed the concept of “Wikitech ambassadors”, to be developed in the coming weeks.

Wikidata

The Wikidata project is funded and executed by Wikimedia Deutschland.

The Wikidata team has made good progress towards their first roll-out. The initial deployment plans are being made and the Hungarian Wikipedia community stepped up to be the first to use the interwiki part of Wikidata in a few weeks. This also means the demo system needs to be tested more. If you have five spare minutes, have a look at the demo system and report any bugs you might find there so they can be fixed before the initial deployment.

The team also started to collect future use cases of Wikidata that should be kept in mind during development. You are invited to refine them or add your own. Additionally, the team is looking for feedback on the third iteration of the storyboard for linking Wikipedia articles in the future.

Future

The engineering management team continues to update the Software deployments page weekly, providing up-to-date information on the upcoming deployments to Wikimedia sites, as well as the engineering roadmap, listing ongoing and future Wikimedia engineering efforts.


This article was collaboratively written by Wikimedia engineers and managers. See revision history and associated status pages. A wiki version is also available.


This article was edited on December 1st, 2012. The following content was changed: the number of processed shell requests was corrected.

4 Responses to “Wikimedia engineering July 2012 report”

  1. Ah, I had missed that message, thank you. The last method I had seen was this one and it seemed painful. I’ll try the method you linked to and I’ll update the numbers.

  2. Harry Burt says:

    Ah right, I thought perhaps a reliable method had been found [1]?

    Harry
    [1] http://www.gossamer-threads.com/lists/wiki/wikitech/283396#283396

  3. Harry: Seeing that I had the choice between an unreliable method, or an extremely tedious one, I decided not to include that number until the Analytics team can provide a reliable metric without having to jump through hoops. Hopefully, they’ll provide historical data and we can add that number retroactively.

  4. Harry Burt says:

    No figure for number of contributors to Gerrit in July?

Leave a Reply