Wikimedia engineering December 2012 report

Major news in December include:

Note: We’re also proposing a shorter, simpler and translatable version of this report that does not assume specialized technical knowledge.

Engineering metrics in December:

  • 113 unique committers contributed patchsets of code to MediaWiki.
  • The total number of unresolved commits went from about 535 to about 648.
  • About 39 shell requests were processed.
  • As of December 2012, users can self-register on Wikimedia Labs (and get access to git/Gerrit). It is no longer necessary to request an account for developer access.
  • Wikimedia Labs now hosts 148 projects, 847 users; to date 1378 instances have been created.
  • Detailed community metrics are also available.

Personnel

Work with us

Are you looking to work for Wikimedia? We have a lot of hiring coming up, and we really love talking to active community members about these roles.

Announcements

Technical Operations

Production Site Switchover

The Technical Operations team continued to work on completing the outstanding migration tasks, and to ready our Ashburn infrastructure for the big switchover day, i.e., the complete transition from the Tampa datacenter to the one in Ashburn, on the week of January 22, 2013.
In the past few months, we’ve transitioned services from the Tampa datacenter to the one in Ashburn, which now serves most of our traffic (about 90%). However, application (MediaWiki), memcached and database systems are all still running exclusively out of Tampa. We have been working to upgrade the technologies and set up those systems at Ashburn, and we plan to perform the switchover of those services from Tampa to Ashburn in the coming weeks. This will provide us some assurance of a hot standby datacenter, should we encounter an irrecoverable and lengthy outage in one of the main datacenters.

Site Infrastructure

Because December is when the annual Wikimedia fundraiser happens, the Operations team usually makes fewer site infrastructure changes to mitigate the risks of causing outages. Some of the lesser-risk work performed include deploying the new Parsoid cluster to support the Visual Editor project, rolling out doc.wikimedia.org (our auto-generated puppet documentation), using a new and unified SSL certificate for *wikipedia.org and *.m.wikipedia.org sites, and setting up a monitoring server and service in Ashburn.
Asher Feldman migrated one of the main production slave database server (db59) for the English Wikipedia (enwiki) to MariaDB 5.5.28. He has been testing 5.5.27 on the primary research slave, and on the current build on a slave in Ashburn. Taking the times of 100% of all queries over regular sample windows, the average query time across all enwiki slave queries is about 8% faster with MariaDB compared to our production build of MySQL 5.1-fb. Some queries types are 10–15% faster, some are 3% slower, and nothing looks aberrant beyond those bounds. Overall throughput as measured by qps has generally been improved by 2–10%. Asher wouldn’t draw any conclusions from this data yet: more testing is needed to filter out noise, but initial results are positive. The main reason for migrating to MariaDB is not performance, but rather by the belief that it’s in the Wikimedia Foundation’s and the open-source communities’ interest to coalesce around the MariaDB Foundation as the best route to ensuring a truly open and well-supported future for MySQL-derived database technology.
Mark Bergsma and Faidon Liambotis have made tremendous progress in testing and deploying Ceph in Ashburn. We are hopeful it will be robust and scalable.
Ryan Lane has been writing a new deployment system using git and Saltstack. Parsoid is currently being deployed with this system, and MediaWiki is slated to use it for its next major deployment.

Fundraising

There were no major changes on the fundraising infrastructure because of the fundraiser itself. We ordered and received bastion hosts that we’re in the process of deploying. Monitoring got an overhaul and we’re now sending alerts to the fundraising technical staff or the technical operations team depending on what triggered the alert.

Data Dumps

A tool for dump users to set up interwiki links on their local mirrors is available in alpha, as well as documentation of the interwiki cdb file. Also, work with WanSecurity on mirroring is moving forward: they now hold a current copy of all ‘other’ files, including page views and Picture of the Year bundles, among other things.

Wikimedia Labs

Labs came out of beta this month, following the opening of self-registration. Another major change this month was the migration from the shared NFS instance to per-project glusterfs volumes. A number of smaller changes were made, including: the Addition of puppet documentation links from classes and variables on the instance configuration pages; the modification of the project filter to act as a table of contents; a split of LDAP project groups into projects and POSIX groups; and the installation of Saltstack on all instances to act as a guest agent.

Features Engineering

Editor retention: Editing tools

VisualEditor

In December, the team deployed to the English Wikipedia an alpha version of the VisualEditor for editors to use and give feedback on issues and priorities. The team’s work focussed on ensuring that the integration was reliable, and providing a dedicated tool for editors to report problems with editing, and, after deployment, addressing the reports and ideas from editors. The early version of the VisualEditor on mediawiki.org was also updated to use the new developments (as part of 1.21-wmf6).

Parsoid

The Parsoid project reached a major milestone with its first deployment to the English Wikipedia along with the VisualEditor. This was a major test for Parsoid, as it needed to handle the full range of arbitrary and complex existing wiki content including templates, tables and extensions for the first time.

As witnessed by the clean edit diffs, Parsoid passed this test with flying colors. This represents very hard work by the team (Gabriel Wicke, Subramanya Sastry and Mark Holmquist) on automated round-trip testing and the completion of a selective serialization strategy just in time for the release.

After catching their breath, the team now has its sights on the next phase in Parsoid development. This includes a longer-term strategy for the integration of Parsoid and HTML DOM into MediaWiki, performance improvements and better support for complex features of wikitext.

Editor engagement features

Notifications

This month, the team continued to develop key features of the Notifications project (code-named ‘Echo’), and deployed a first experimental release on mediawiki.org. Fabrice Florin expanded feature requirements for this release, and Vibha Bamba designed more components of the user experience. Ryan Kaldari and Benny Situ developed improved notification flyouts and email digests, as well as new notifications such as page links. Luke Welling built an HTML email module, which will soon be available to other projects as well. We plan to develop more features this month and deploy them for new editors on the English Wikipedia in early 2013. Please help us test these new features to provide feedback and find bugs. We’re also looking to hire a software engineer as part of this project.

Article feedback

We made good progress on Article Feedback version 5 this month. We completed a research study on the English Wikipedia, confirming that many readers use this feature and a sizable number of them go on to register and become editors. Based on that research and editor suggestions, we started development on new features to reduce the editor workload through better filters and simpler moderation tools. We also continued to refactor our code, to support millions of comments on a dedicated database cluster to be deployed in coming months. Once this work is complete, we plan to release Article Feedback v5 to 100% of the English Wikipedia in March, and to other Wikimedia sites later this year. The German Wikipedia has already started a pilot to evaluate this tool, and a similar initiative is also under discussion on the French Wikipedia.

Page Curation

Page Curation is now in ‘maintenance mode’, following its release on the English Wikipedia in September 2012. There was no significant development activity on this project this month. Oliver Keyes has completed a project to look at various ways of localizing Page Curation to any and all wikis that want it: it is currently being reviewed by Howie Fung to assess its feasibility.

Editor engagement experiments

Editor engagement experiments

In December, the Editor Engagement Experiments team launched a new test aimed at Onboarding new Wikipedians. This interface delivers an optimized task list immediately after sign up, inviting those without an idea of how to get started to choose an article and try their hand at editing. The related GettingStarted extension was deployed mid-month and continued to evolve throughout the month, as early quantitative and qualitative research was conducted.

To go along with the launch of GettingStarted and other experimentation, EventLogging underwent heavy development, including the launch of a new Schema namespace on Meta for defining the data collected in a public, collaborative manner. We created production schemas for GettingStarted, account creation, mobile, and more. Ori Livneh also reworked the format, transmission, and cleanliness of data delivered to analysts and product managers, automatically generating database tables from these schemas for incoming events.

Late in the month, the team collaborated with fundraising to reach out to donors and readers as part of the annual fundraising campaign via email and a “Thank You” banner which ran at the end of the year. In addition to introducing millions of donors and readers to the Wikipedia editor community and inviting them to join, this campaign helped the team establish an experimental baseline for what a campaign to convert readers might look like.

In addition to the above launches, we continued development of the new account creation experience and Guided Tours by Matt Flaschen, which will be launched in January 2013. Active development was also begun by Ryan Faulkner and Dario Taraborelli on a user metrics API. The effort is threefold: to standardize user metrics in data analysis, to build infrastructure to efficiently compute metrics for a large set of users, and finally to expose those results via an API.

Support

2012 Wikimedia fundraiser

The 2012 annual fundraiser continued in December and was a resounding success. In addition to the ongoing maintenance required to operate the fundraiser, the team helped to execute the Thank You campaign and started to put into place new tools for auditing the fundraiser after its completion.

Mobile

The Mobile development and design team worked to finalize contributory and other experimental editor-focused features on the Beta site (uploads, editing, and watchlist functionality) in order to clear the way for a full push on mobile uploads by March 2013. We also worked to improve the reader and potential editor experience by introducing features geared toward educating/engaging our users, such as a human-readable last modified timestamp for articles and watchlist, and thumbnail images to illustrate the watchlist view. Lastly, because of the huge interest we generated in our Beta testing site, we created an Alpha site to house very early work on contributory features, in order not to disrupt the reading experience of our 100,000+ Beta users.

GeoData Storage & API

During December Max Semenik continued work on GeoData, the extension directly responsible for allowing us to easily store and retrieve GPS coordinates in our databases. Max migrated the extension from implementation, to code review, and finally deployment to the English Wikipedia. It will become 100% production-quality after a few more tweaks and fixes. After those changes, we’ll continue to roll out to the rest of the wikis. The extension is one of the precursors to having the “near by” feature on our mobile web site.

Wikipedia Zero

During the month of December, Patrick Reilly, Dan Foy and the rest of the Zero team launched Wikipedia Zero with a new partner, Orange Congo. They resolved operational issues that prevented the team from accurately recording traffic from the Opera browser. They also helped on-board Brion Vibber to help in the interim while the team continues to look for permanent members. The team is very excited about its upcoming launches and will be announcing them as soon as possible.

J2ME App

The J2ME app is ready to launch pending contractual negotiations with carriers.

Wikipedia over SMS & USSD

The USSD service is ready to launch pending contractual negotiations.

Mobile QA

The Mobile QA team planned and began several projects in December, in particular: an upcoming community test event for Mobile features; support for MobileFrontend in beta labs; and significant new UI-level automated tests in the gerrit queue.

Platform Engineering

MediaWiki Core

MediaWiki 1.21

We continued the bi-weekly deployment cycle, deploying MediaWiki 1.21wmf5 and 1.21wmf6. We stopped deployments at the end of the month due to the holidays, restarting the 1.21wmf7 cycle on January 2.

Git conversion

There’s not much to report for the month of December so far with Gerrit. New repositories continue to be created, and the vast majority of active parts of SVN have been marked read-only by now. Upgrading to a newer version of Gerrit is still blocked on our LDAP problem with master, but the patch to fix that is nearly complete. Mid-December, we extended the Verified category to now allow +2 (in addition to +1 and -1), so Jenkins has a wider range of statuses it can report.

TimedMediaHandler

Jan Gerber continued to refine the TimedMediaHandler extension, making the transcoding steps more robust.

Wikidata deployment

The Wikibase client extension was deployed to test2 in December. We plan further deployment work in January, deploying to the Hungarian language Wikipedia on January 14, 2013.

SwiftMedia

Captchas are ready to be served from Swift. They previously were for several days, but the configuration had to be reverted to due random errors from Swift. A new set of captchas are being tweaked for readability and are served from Swift on the test wikis. Captchas are one of the last NFS dependencies.

Site performance

After an assessment by Asher Feldman, Patrick Reilly and Tim Starling, the RDB database patch was canceled. Instead, in the short term, a separate vertically partitioned data cluster will be provided as a temporary storage until a horizontally scalable architecture can be finalized. Matthias Mullie is modifying the RDB-dependent ArticleFeedbackToolv5 to remove that dependency through an abstraction layer. When a sharded or horizontally scaled solution is provided, AFTv5’s abstraction will be migrated. An initial assessment of various non-MySQL alternatives for using Aaron Schulz’s JobQueue core patch in 1.20 is being done for Echo. Because of the time it takes to exhaust the Echo queues, it is written to bypass the JobQueue through direct calls. Luke Welling is abstracting the JobQueue for Redis, ZeroMQ, and others.

Admin tools development

The initial code was committed for interface for Stewards to mass-lock user accounts. For global AbuseFilters, a permission for global rule-writing was merged and the initial code for using WikiSets in the rules was written. Initial code committed for renaming CentralAuth user accounts.

REST proposal

Wikia has completed a preliminary prototype (deemed to be disposed of after all the valuable data has been collected) in order to validate the design and its core concepts, identify and explore possible issues and test limits imposed by the platform. It will allow be used to explore the usage of PHP 5.4’s new features to ease the implementation of a maintainable versioning system (the prototype abuses PHP’s implementation of namespaces in some cases, this is not meant to persist in the final prototype but was rather a stress test), test human-readable formatting for responses when called by specific clients, and measure overhead added by the software abstraction. As a result, some pain points and alternative routes have been identified on which research work will be carried on in late January/beginning of February 2013, leading the team closer to a final implementation and related RFC. The code will be available for a short time in a dedicated branch at Wikia’s app repository at Github.

Security auditing and response

The team continued to respond to several reported vulnerabilities. A follow-up security review for Wikidata phase 2/3 was done.

Quality assurance

Beta cluster

The project to support MobileFrontend in Beta labs continues. We intend for Beta labs to become a test environment for the new git-deploy script from the Operations team: this should be helpful in ongoing maintenance of the environment

Continuous integration

The last Jenkins jobs (mostly Analytics ones) that were still using the Gerrit Trigger plugin have been migrated to being triggered by Zuul. Zuul now support triggering tests for whitelisted users. This has been deployed to let trusted users have unit tests run whenever they send a patchset in mediawiki/core (gerrit change 39310). Volunteer Merlijn van Deen built a script to replicate our Jenkins installation and worked on having extensions tests run on different MediaWiki branches.

Browser testing

After its announcement about the state of automated browser testing on wikitech-l, the QA team continued to expand test coverage, improve system and project documentation, and publicize and socialize the project by means of the “Browser Testing” MediaWiki Group.

Analytics

Kraken (Analytics Cluster)

LDAP Hue/Hadoop authentication works, but group file access still needs to be worked out. We’ve puppetized an Apache proxy for internal Kraken and Hadoop web services, as well as udp2log kafka production and kafka hadoop consumption. The event.gif log stream is being consumed into Hadoop. We’re attempting to use udp2log to import logs into Kafka and Hadoop without packet loss, and backing up Hadoop service data files to HDFS (e.g. Hue, Oozie, Hive, etc.).

Limn

A major rework of Limn to use d3.js and Knockout.js is complete and will be used for the next ReportCard. Dan Andreescu and David Schoonover are working on graph editing and geospatial data visualization.

Engineering community team

Bug management

Daniel Zahn and Andre Klapper upgraded Bugzilla to the latest stable version (4.2.4) which provides higher flexibility for displaying interface elements, improved custom search, better JSON-RPC support and a solid base for future improvements being considered. Andre continued to improve the bug management documentation. Many bug reports that were previously closed as RESOLVED LATER were retriaged and RESOLVED LATER was disabled for future use, and a large number of previously unprioritized bug reports received a priority setting. Furthermore, Andre looked after reports about CSS issues after the MediaWiki 1.21wmf5 deployment and followed up by triaging, creating requested Bugzilla components, etc. Several smaller regex fixes were deployed in Bugzilla to fix automatic linking to Gerrit changesets. A “patch in gerrit” bug status was discussed on wikitech-l with the conclusion to wait for automatic notifications (comments) from Gerrit into Bugzilla about patch status changes first (which is being worked on by the Wikidata team).

Mentorship programs

Six MediaWiki candidates have been announced for the Outreach Program for Women (OPW). 4 of them are funded by the Wikimedia Foundation and 2 by Google through an agreement with the GNOME Foundation, organizers of the program. They will work as full-time interns under the supervision of MediaWiki mentors between January and March 2013. We got 10 submissions from about 25 people interested. The rather open and participatory selection process we have defined for OPW will be used as a basis for future mentoring programs. We’ve also started matchmaking for the LevelUp mentorships for the coming quarter.

Technical communications

Guillaume Paumier published a project plan and timeline for the consultation process started in October about how to improve 2-way communication between the technical and editing communities. He summarized the results of the first phase and reached out to the wikitech-ambassadors list to widen the consultation process by proxy. After consolidation and prioritization of the results, the most feasible solution appeared to be to grow a network of ambassadors, which he started to organize on meta.
Unrelatedly, Guillaume made a list of 2012 tech blog posts to map tech blog activity by month & subdepartment (with priority activities listed separately). Work on setting up a Volunteer product manager program is also underway.
Quim Gil sorted out Social media channels, and we now have @MediaWiki handles for identi.ca, Twitter, Facebook and Google+. He published the community metrics November report and a blog post introducing this new activity.

Volunteer coordination and outreach

MediaWiki Groups became official and the first proposals are going through the approval process. As a side effect, a process for requesting regional mediawiki-themed mailing lists has been created with mediawiki-india as the first case. At least three Wikimedia-related talks have been accepted at FOSDEM.

Language engineering

Language tools

Development of the new user interface for Translate, as well as the translation editor functionality, continued at full pace throughout the month of December, with iterative feature development and user experience improvements. Santhosh Thottingal and Niklas Laxström are leading development and Pau Giner is focusing on optimizing user experience elements. The team also released the latest version of the MediaWiki Language Extension Bundle. Increased support for language variants, alternate language codes were added to the Universal Language Selector. Alolita Sharma continued to work with Red Hat’s localization and internationalization teams to evaluate localization data, translation tools and internationalization tools and technologies.

Milkshake

More language input methods contributed by language communities were added to the jquery.ime library.
Other news
Pau Giner and Amir Aharoni participated in the Open Tech Chat this month to talk about best practices in multilingual user testing and internationalization. Amir Aharoni also participated in mentoring OPW candidate Priyanka Nag for the new LevelUp program. Srikanth Lakshmanan and Arun Ganesh’s tenure ended with the Language Engineering team in December.

Kiwix

The Kiwix project is funded and executed by Wikimedia CH.

A new Kiwix 0.9rc2 was released. This version embeds our ZIM HTTP server kiwix-serve for Windows, OSX and Linux. It is now integrated in the Kiwix UI, allowing everyone to share Wikipedia on a LAN in two clicks . We have revamped our audience measurement tool, a solution that could be interesting for other projects using Mirrorbrain. We continue at the same time to increase our ZIM production throughput with 8 new Wikipedia ZIM files in December. December was also a month of new records for Kiwix: for the first time, we have had more than 70.000 downloads a month and a Lead position for Education software at Sourceforge.

Wikidata

The Wikidata project is funded and executed by Wikimedia Deutschland.

New code and bugfixes have been deployed (with MediaWiki 1.21wmf5 and 1.21wmf6) and test2 now gets language links from Wikidata. Changes on Wikidata that concern articles on test2 are shown in the recent changes of test2 as well. If there are no problems, deployment on the Hungarian Wikipedia will happen on January 14, 2013. Other Wikipedia sites will follow.
For the second phase of Wikidata, representation of values is the central focus. We published a draft and discussions have started; we’d appreciate your feedback. Additionally, Denny Vrandečić and Lydia Pintscher held IRC office hours; logs are available in English and German.

Future

The engineering management team continues to update the Deployments page weekly, providing up-to-date information on the upcoming deployments to Wikimedia sites, as well as the engineering roadmap, listing ongoing and future Wikimedia engineering efforts.

This article was written collaboratively by Wikimedia engineers and managers. See revision history and associated status pages. A wiki version is also available.

0 Show

0 Comments on Wikimedia engineering December 2012 report

Leave a Reply

Your email address will not be published. Required fields are marked *