Wikimedia engineering February 2013 report

Translate this post

Engineering metrics in February:

  • 110 unique committers contributed patchsets of code to MediaWiki.
  • The total number of unresolved commits went from about 650 to about 830.
  • About 69 shell requests were processed.
  • Wikimedia Labs now hosts 150 projects and 1,002 users; to date 1561 instances have been created.

Major news in February include:

Note: We’re also proposing a shorter, simpler and translatable version of this report that does not assume specialized technical knowledge.

Upcoming events

There are many opportunities for you to get involved and contribute to MediaWiki and technical activities to improve Wikimedia sites, both for coders and contributors with other talents.

For a more complete and up-to-date list, check out the Project:Calendar.

Date Type Event Contact
Mar 7 Fresh bugs QA: General MediaWiki reports Bug Triage AKlapper, Valeriej
Mar 13 Browser testing Increase the backlog: Given/When/Then explained, with examples from Search tests and suggestions for more Zeljko.filipin, Qgil, Cmcmahon
Mar 18 Old bugs QA: LiquidThreads (LQT) Bug Triage AKlapper, Valeriej
Mar 19 Online meetings Office hour about Wikimedia’s issue tracker and Bug management in #wikimedia-officeconnect AKlapper
Mar 20 IRL (physical) meet-ups and conferences SMWCon Spring 2013 (New York City, USA)
Mar 22 IRL (physical) meet-ups and conferences LibrePlanet (Cambridge, MA, USA)
Mar 25 Features testing QA: Collaborate with Weekend Testers Americas: test new tools for new users (E3) Cmcmahon, Qgil

Personnel

Work with us

Are you looking to work for Wikimedia? We have a lot of hiring coming up, and we really love talking to active community members about these roles.

Announcements

  • Ed Sanders joined the Features engineering group as Software Engineer working on Visual Editor (announcement).
  • Christian Aistleitner joined as a contractor specializing in work on Gerrit (announcement).
  • Marc-Andre Pelletier joined the Technical Operations team as Operations Engineer (contractor), focusing on the Wikimedia Labs infrastructure and migration of tools (announcement).
  • Kirsten Menger-Anderson joined the Features group as a part-time contractor Technical Writer focusing on Editor Engagement Experiments.
  • Greg Grossmeier joined the Platform engineering group as Release Manager (announcement).
  • Site Performance Engineer and Senior Technical Advisor Patrick Reilly’s last day with WMF was February 19th (announcement).

Technical Operations

Site infrastructure

Both Asher Feldman and Peter Youngmeister are proceeding cautiously in broadening MariaDB deployment in our clusters. We have one MariaDB instance for each of the database clusters (s1 to s7). The MariaDB support team has been quick in resolving bugs we encountered along the way. In another database administration task, Asher reviewed and deployed the Wikidata schema changes and migrated it from s3 cluster to s5, adding more growth capacity.
We put sixty new application servers into production in each of the two datacenters. This is in anticipation of expected traffic growth coming from both our regular and mobile sites in the coming year.
Lately we have been experiencing short time-out failures in the nightly search indices built with search-pool4. Asher is experimenting with a fix. He redistributed the search-pool4 indices in the Tampa data center based on sizes and what seems to be a more acceptable index size-to-ram ratio. We essentially have a virtual search-pool5 shard, but with the spelling and highlight indices for pool4 and pool5 sharing the same servers. The pool4 wikis are using the new setup in Tampa, with everything else continuing to use our Ashburn cluster. We should know soon if it works.
The TechOps team had a in-person team meeting the week of 25th February in WMF’s San Francisco office.
  • The highlights of the meeting were:
    • Discuss the upkeep of the “failover” datacenter and capture the lessons learned from the recent datacenter switchover. For example, we think we could reduce the switchover “readonly” time from 32 minutes to 10 minutes by automating more of the database and caching failover procedures.
    • Improve and streamline our hiring process and our security access model.
    • Organize a coordinated sprints process to fill the gap between smaller tasks (for which we use the RT ticketing system) and larger tasks (which require department coordination) by collecting some thoughts. We started brainstorming and forming teams via the Projects wikitech page.
    • Face-to-face meetings with the Engineering teams from Platform, Mobile, Analytics and Wikidata.
    • Short TechOps sprints to reduce cronspams and our RT queue.
    • Review budget needs for the 2013-2014 fiscal year.

Data Dumps

Numerous bug fixes were made to the mwxml2sql tool, and a set of SQL files bsed on an English language Wikipedia XML dump was published for use by testers [1]. A tool to convert SQL dumps to escaped tab-delimited format is now available for use with MySQL’s LOAD DATA INFILE command, much faster than INSERTs. All SQL fles from the same dump were converted to this format and also published.
A new mirror has come on line, initially mirroring historical archives of XML files as well as MediaWiki releases, page view statistics and other files [2]. Thanks to Robert Smith and Wansecurity.com for providing the resources to make this happen.

Wikimedia Labs

This month was mostly spent stabilizing Labs components. Labs Ganglia was fixed to report instance statistics properly. Adminbot was updated to fix utf8 issues, and to fix package issues when upgrading. A number of changes were made to the glusterfs support to bring more stability. Gluster was upgraded to 3.3.1 to fix a memory leak on both the client and server. Gluster isn’t matching our use case of multitenancy, as the glusterd daemon isn’t handling the large number of volumes well. To help with this, until we either fix the issue in gluster, or replace it, we’ve made a change to not create/manage Gluster volumes for projects unless they opt in. We’ve also disabled and deleted Gluster volumes for projects that are currently unused. Work was done to turn Puppet classes for installing MediaWiki in Labs into modules, so that they can be reused more easily.
We merged wikitech.wikimedia.org (our operations and infrastructure documentation) and labsconsole.wikimedia.org together into wikitech.wikimedia.org. wikitech-static.wikimedia.org is available as a backup, in case all access to our cluster is unavailable. Work was started on supporting saltstack reactors, to replace the bootstrapping for instance creation. This month we have new member of the Labs team, Marc-Andre Pelletier, also known in the community as Coren. Coren will be working on the new Tool Labs infrastructure and we’re very excited to have him on-board. Asher and Peter started work on replicated databases for Tool Labs during the last week of the month.

Features Engineering

Editor retention: Editing tools

VisualEditor

In February, the team worked on improving the design, user interface components and API infrastructure of VisualEditor, preparing for the new features that will be added in the coming months. The objective is for VisualEditor to be the default editor for all users, capable of letting them edit the majority of content without needing to use the wikitext editor, in July 2013. This will mean adding support for references, (at least) basic templates, categories and images, each of which is a very large piece of work. During this time, the team has expanded with the recruitment of Ed Sanders, who will focus on the data infrastructure of VisualEditor’s platform. The alpha version of VisualEditor on mediawiki.org and the English Wikipedia was updated twice (1.21-wmf9 and -wmf10), adding support for Microsoft Internet Explorer (version 9 and above), fixing a number of bugs reported by the community, improving internationalisation, and restructuring the data model layer so that the code interfaces are ready for the new features.

Parsoid

The Parsoid team continued to improve support for non-English wikis. This involved exposing more configuration information through the MediaWiki API and using it throughout Parsoid. The support is now reasonably complete, but needs testing. The round-trip testing framework needs to be adapted to support running tests on pages from multiple wikis.

A new contributor, C. Scott Ananian, improved Parsoid’s performance by switching the DOM library from JSDom to Domino. He also improved image handling and contributed numerous other patches.

The tokenizer was modified to parse one top-level block at a time, which helps to spread out API requests and minimize the number of tokens in flight. The serializer is in the process of being rewritten to work on DOM input to benefit from the context provided by the DOM. This rewrite is expected to simplify the logic significantly, and help fix some more selective serialization issues that are blocking a deployment to production.

We also used the ops and core hackathon to discuss and refine our storage plans. Finally, we wrote a blog post about Parsoid on the WMF tech blog.

Editor engagement features

Notifications

This month, we continued development on the Notifications project (code-named Echo), which is now being tested on mediawiki.org. Ryan Kaldari and Benny Situ developed new features such as bundling, dismiss and web preferences, as well as refactored the code for the fly-out, archive page and email notifications. Luke Welling continued to develop a more robust job queue. Fabrice Florin spearheaded discussions about notifications for both new and experienced users, and updated requirements for this first set notifications for our upcoming release. We will develop these notifications and final features in coming weeks, and are aiming for a first release on the English Wikipedia next month; in the meantime, you can help us test the current version on mediawiki.org. To learn more, read this project update on the Wikimedia blog. If you are a software engineer, check out this job opening to join our team and develop more editor engagement projects like Echo.

Flow

In February, we analyzed and collated user research concerning talk pages. Early designs were shown to members of the Board of Trustees to ask for their input. Jeff Atwood (from StackOverflow and Discourse) came in to give us a brain dump of his work. Design work was done on secondary “modules” as examples for how existing workflows can be rebuilt within the Flow system. Community engagement strategies saw the beginnings of implementation with the creation of a “Portal” that will engage discussion about Flow at three locations (mediawiki.org, meta, and the English Wikipedia).

Article feedback

This month, our team completed feature development for Article Feedback v5 (AFT5) and prepared to release an updated version on English, French and German Wikipedias. Developer Matthias Mullie developed a final set of new features, including simpler moderation tools, better filters, a new feedback link, auto-archive and discussion on a talk page. Designer Pau Giner posted a usability study report about the effectiveness of the new moderation tools. Community liaison Oliver Keyes contributed to this request for comments, which concluded with a request to remove AFT4 and provide an opt-in version of AFT5 on the English Wikipedia (going forward, editors who wish to enable AFT5 for articles they watch can simply add the special Category:Article Feedback 5 on those pages). Product manager Fabrice Florin collaborated with members of the German Wikipedia (which is now testing the tool in its ongoing pilot, with a vote expected in May) and the French Wikipedia (which just voted to start its own six-month pilot). We plan to deploy AFT5’s final version on these projects by the end of the month, as described in this 2013 release plan.

Editor engagement experiments

Editor engagement experiments

In February, the Editor Engagement Experiments team (“E3”) continued working toward completing its goals for the quarter ending in March, which includes updates on the following projects.

After the intial launch of guided tours, Matt Flaschen and other team members worked on A/B testing the effectiveness of guided tours as part of the onboarding new Wikipedians experiences currently enabled on English Wikipedia. Results from these controlled tests are vital to understanding the impact of tours on editor engagement. In the meantime, the GuidedTour extension was enabled on Wikimedia Commons and six Wikipedias (including French, German, and Dutch), so that local administrators and volunteer developers could take advantage of the feature.

In addition to working on polishing and quantifying the effect of guided tours, significant progress was made on a new landing page for the onboarding project, with plans to launch early in March. The new Getting Started page will be expanded to include a wider variety task types offered to new editors. It will also be generated from a basic recommender system coupled with the GettingStarted extension, rather than relying on a bot.

Kirsten Menger-Anderson joined the team as Technical Writer mid-month. She began work with Ori Livneh, Dario Taraborelli, and others on documenting the EventLogging extension, with the goal of producing a comprehensive guide for end users of EventLogging, especially other Wikimedia Engineering teams in need of data. Future work by Kirsten will include similar documentation of the User Metrics data analysis API, which will be opened up for internal use in March.

Support

2012 Wikimedia fundraiser

The majority of February was spent paying down the more glaring examples of technical debt we acquired during the 2012 English fundraiser, before jumping straight into a whole new round of International fundraising that kicked off on February 27th at approximately 15:00 UTC (7am PST). Due to unforeseen problems with one of our payment gateways, we were forced to scrap our plans for a continuous international fundraising effort spanning March through June, and will instead attempt to raise as much of the remaining budget in March as we are able. All other plans have been precluded by the March fundraising efforts.

Mobile

Commons App

February saw numerous beta releases of the Commons iOS and Android app. The Android app was published to Google’s Play store with a Beta label. Brion Vibber announced for testers to get envolved in http://tflig.ht/Zl9Ef7. A significant amount of time was spent on the visual polish, bug fixing, and internationalization in prep for a very active series of betas. Extensive work was also done to log specific user actions using E3’s EventLogging setup to help us make data driven decisions in the future.

Wikipedia Zero

During February, we launched with a new partner, Orange Botswana. We’ve also begun testing with Vimpelcom for an upcoming launch in March. In addition, we’ve made improvements to the partner dashboard which tracks Wikipedia Zero usage.

Open Street Map

Tomasz Finc, Max Semenik and Arthur Richards worked on organising the OSM hackathon which took place on March 9-10 in Copenhagen. Max Semenik continued working on OSM in Labs and uploaded initial versions of several OSM-related packages to Gerrit for Operations to review.

Mobile Web Photo Upload

In February we released the ability to upload and add images to articles lacking them to the full mobile web. We continued work on improving the upload and mobile file page views, for full productization in early March. We also explored two new upload workflows in alpha/beta: adding a call to action to articles that appear in the Nearby and Watchlist view, allowing users to quickly see articles near them that may need an image, take a photo and upload. Lastly, we began collaborating with the Fundraising team to enable CentralNotice on the mobile web, giving us the ability to deliver targeted banners to users who might be interested in trying out our new upload features or Commons apps.

Platform Engineering

MediaWiki Core

MediaWiki 1.21

Deployments of 1.21wmf9 and 1.21wmf10 went to production as scheduled with mininal issue.

Git conversion

Gerrit was upgraded this month to a pre-2.6 snapshot. This enabled the use of plugins, as well as brought numerous bugfixes and UI improvements. Work is underway on a plugin to provide Bugzilla integration and to replace Gitweb with a better repository viewer called Gitblit. All of our git repositories are now automatically replicated to GitHub. We’ve begun some initial planning into how we can improve the “new repository request” process, making it much easier for users with a quicker turnaround time.

TimedMediaHandler

Jan Gerber continues to work part-time for the WMF to fix multimedia bugs. Fixes include better support for FLAC files (bug 43249) and better support for metadata display in small embedded players (bug 44272).

Wikidata deployment

Wikidata has been deployed to Hungarian, Italian, Hebrew, and English Wikipedias!

Media storage

Nearly all files have been copied from Swift (in Tampa) to Ceph (in Ashburn). Further scripts will be run to synchronize the Ceph files to account for deletions and updates to files. The Varnish configuration to handle URL rewriting (to take the place of rewrite.py in swift) is already coded, though not yet in use.

Lua scripting

We deployed Lua/Scribunto to several wikis, including English Wikipedia, on February 18th. The current plan is to deploy to the remaining wikis on March 13th.

Site performance

A patch to allow moving the job queue to another DB cluster has been merged, and another patch to support an alternative Redis-based queue is in review in gerrit. Currently, job-related operations consume a significant portion of production database master wall time.

Admin tools development

The team worked on a number of areas this month. The interface for Stewards to mass-lock user accounts was completed and will be deployed very soon next month. The support for global AbuseFilters nears completion, with a test deployment to test.wikipedia.org and mediawiki.org; once internationalisation is more complete, it will be deployed for all wikis. The team worked to agree a specification for a global CheckUser tool. Progress was made on a global account renaming tool and XFF-based global and local blocks. The team also worked on finalising the migration to Single User Login, building some metrics to ascertain a sense of the problem.

REST proposal

Yuri started gathering requirements for the RESTful (content-oriented) API as part of the overall planning for the API v2.0 roadmap. Also, Wikia’s focused R&D sprint has led to remove all the remaining obstacles identified previously and Wikia has identified a new, larger dedicated Product team to get to a final implementation of the REST API following the directions set by the existing prototype and internal RFC; the team (API/Data) is the same one in charge of Search and all the related API’s, this will ensure a better integration of this product into the new API strategy. At the moment of writing the knowledge transfer required for the team to start this new phase has just begun and the team has to first to complete the current work on another project already in progress. In the meantime, Wikia will make an RFC public for review and feedback.

Security auditing and response

Continued responses to reported vulnerabilities. Preparation for security releases for 1.19 and 1.20 branches of MediaWiki. Continued review of Fundraising.

Quality assurance

Quality Assurance

Concluded mobile app file upload test exercise. Preparing for new test exercise, possibly for Search. Exploring another collaboration with Weekend Testers.

Beta cluster

We are adding search to the beta cluster following Mobile Frontend tweaks. We are discussing new ways to use the beta cluster as a result of our San Francisco gathering in February.

Continuous integration

Antoine Musso worked with several MediaWiki extension authors to ensure that the unit tests for those extensions are run by Jenkins and that they work. He hopes to have all extensions that run on the Wikimedia production cluster fully operational by the end of February. Antoine also integrated PHP CodeSniffer into our automated test runs.

Browser testing

Added E3 tests. Preparing for test event to increase the backlog. Sophisticated tests for Language need tweaking/research.

Analytics

Kraken (Analytics Cluster)

We did two reviews of Kraken: one for security and one for overall architecture. We’re incorporating the feedback, which includes merging our puppet modules into the operations puppet repository and the test puppet in Labs. Work has started to create dashboards for mobile pageviews, Wikipedia Zero and the mobile alpha and beta sites.

Limn

Highlights in the past month include

  • Stacked charts
  • Debianization & Puppetization
  • New E3 and Grantmaking dashboards
  • Ad-hoc visualization of datasource

Wikistats

New mobile pageview report is in testing phase but not ready to ship.

Page view logging

It was a quiet month for the logging infrastructure; things were running fine. We have been working on a patch to fix bug 45178, which we will try to deploy in March.

Engineering community team

Bug management

As part of the Weekly QA Goals, a Git/Gerrit Bug Triage day took place. About 25 open reports were retested and/or synchronized with their status in the upstream bugtracker. The Bug day format will be developed further to make it more attractive to new contributors.

Valerie published an initial version of a Bug Life Cycle flowchart describing the life of a bug report by its status changes over time, continued investigating feedback channels and workflows of other bigger free software projects, and also helped testing the Commons Upload app for Android and the mobile browser as part of Mobile QA testing. A table on Bugzilla use by development teams was made available.

Furthermore, reachout to several development teams continued to better understand the different bug management needs, and discussions took place about a workflow how to mark fixed tickets as backport candidates in the issue tracker, potentially resulting in the addition of a dropdown menu (“flag“) in Bugzilla.

Mentorship programs

The Outreach Program for Women is more than half-way. Our six participants are fairly on track; read February reports from Valerie, Mariya, Priyanka and Sucheta. Teresa is working on unit tests for the Git repository extension and is looking at a request to use this extension to help to maintain CentralNotice-related content. Isarra completed her work on Flow/User tests and now is working with the Editor Engagement team on improvements to the Watchlist design. Google published the timeline for the Summer of Code 2013 and we have confirmed our intention to apply as organization. Without big announcements and more than a month before any deadline, we have already 15 students, 5 mentors and 2 org admins potentially interested.

Technical communications

Guillaume Paumier finished setting up the Project:Calendar, used it to replace content on pages like QA/Weekly goals and Meetings using selective transclusion, created an edit notice to make it easier to add and edit events, added a bullet list display option, and added links to icons credits as part of an effort to harmonize visual identity for MediaWiki. He updated the monthly report how-to to reflect the current process, and met with LCA staff to discuss possible collaboration between the tech ambassadors and community advocates programs. While in San Francisco, he met with many colleagues to discuss engineering project documentation, and ways to announce to and engage with the rest of the community. Last, he started to create a Product development hub to facilitate the involvement of contributors, and supported the engineering team in communicating about their accomplishments on the Wikimedia Tech blog.

Volunteer coordination and outreach

Language engineering

Language tools

  • Translate (TUX) enhancements: Development continues full steam ahead on the new translation editor with proof reading feature by Santhosh and Amir. Niklas continues to enhance and test backend translate infrastructure, including Solr integration and other translation aids.
  • Plurals support: by Santhosh and Amir to be more consistent with CLDR standards.
  • Technical Font Specification for Indic scripts: We kicked off this collaborative project between Red Hat and Wikimedia at the Language Summit in February. Santhosh and Runa are contributors to this project.
  • Language Coverage Matrix: This matrix aims to provide an up-to-date status of language support for all tools that the team is developing and maintaining.
  • Mediawiki i18n code review: Team continues to support Mediawiki release with i18n code reviews across other features and extensions.
  • Mediawiki Language Extension Bundle (MLEB): Monthly release of MLEB completed with release notes by Amir.

Milkshake

  • jQuery.IME: Continue to merge input methods contributed into jQuery.IME. We now have 155+ input methods for 75+ languages.
  • jQuery.ULS: Continue to maintain jQuery.ULS. Awaiting resolution of deployment issues.

Kiwix

The Kiwix project is funded and executed by Wikimedia CH.

We have migrated our source code repository from Subversion to Git. We have have also focused in February on the revamping of the Kiwix Web site. The new Web site is really more user friendly. Audience continues to grow with 120,000 downloads of the software in February.

Wikidata

The Wikidata project is funded and executed by Wikimedia Deutschland.

In February the first phase of Wikidata (language links) was deployed on the English-language Wikipedia. Additionally the first parts of phase 2 (infoboxes) went life on wikidata.org. It is now possible to add statements. For an example see d:159. The first tools have already been written on top of this, for example Geneawiki and Reasonator. In the meantime more work has been put into additional data-types, like strings and geocoordinates, as well as the foundations of phase 3 (lists based on queries).

In other good news: Wikimedia Germany has decided to fund Wikidata development after the end of the first year of development at the end of March.

Future

The engineering management team continues to update the Deployments page weekly, providing up-to-date information on the upcoming deployments to Wikimedia sites, as well as the engineering roadmap, listing ongoing and future Wikimedia engineering efforts.

This article was written collaboratively by Wikimedia engineers and managers. See revision history and associated status pages. A wiki version is also available.

Archive notice: This is an archived post from blog.wikimedia.org, which operated under different editorial and content guidelines than Diff.

Can you help us translate this article?

In order for this article to reach as many people as possible we would like your help. Can you translate this article to get the message out?