Wikimedia blog

News from the Wikimedia Foundation and about the Wikimedia movement

Wikimedia engineering July 2013 report

Major news in July include:

  • Giving more editors an easy-to-use editing interface (the VisualEditor) on several Wikipedias
  • Improving language support on our sites via summer interns’ projects and easier configuration options, and asking for help translating the VisualEditor interface
  • Enabling users to edit our sites from mobile devices, like phones and tablets, and announcing a future user experience bootcamp focusing on mobile editing
  • Finishing our transition from keeping source code in Subversion to storing it in Git
  • Launching a Wikipedia Zero partnership with Aircel, giving mobile subscribers in India the potential to access Wikipedia at no data cost
  • Updating the Wikimedia movement on how we intend to protect our users’ privacy with HTTPS
  • Signing a contract with longtime MediaWiki contributors to manage MediaWiki releases for the open source community
  • Explaining how we find and gather software problems and deliver the fixes to users

Note: We’re also providing a shorter, simpler and translatable version of this report that does not assume specialized technical knowledge.

Engineering metrics in July:

  • 114 unique committers contributed patchsets of code to MediaWiki.
  • The total number of unresolved commits went from around 960 to about 1283.
  • About 40 shell requests were processed.
  • Wikimedia Labs now hosts 168 projects and 1,623 users; to date 2,167 instances have been created.
  • The tools project in Labs now hosts 252 tools and 218 members.

Personnel

Work with us

Are you looking to work for Wikimedia? We have a lot of hiring coming up, and we really love talking to active community members about these roles.

Announcements

  • Bryan Davis joined the Platform Engineering team as a Senior Software Engineer, working generally on backend software issues and starting off supporting multimedia (announcement).
  • C. Scott Ananian joined the Parsoid team as a Senior Features Engineer (announcement).
  • Kenan Wang joined the Product team as Product Manager for Mobile (announcement).

Technical Operations

Site infrastructure

Lots of Puppet refactoring work got done this month, including considerable reorganization of the puppet masters. Several manifests have been moved into modules, but completing this project will take many months.

Data Dumps

The English Wikipedia dumps ran out of our Ashburn data center this month, and so did a number of other big wikis’ dumps. There’s an issue with the abstract dumps that needs to be sorted out for those, but other than that everything ran smoothly.
Petr Onderka has been getting a lot of work done on the incremental dumps. A first preview of the code was announced as well as a proposed binary file format which the program currently uses. For a preview of what’s coming up, you can check the timeline. Your comments and suggestions are welcome!

Wikimedia Labs

Though there were some features introduced this month, the majority of our time was spent on documentation, tracking down bugs and improving usability. We had a documentation sprint this month, targeted at improving documentation for the Tool Labs project. Work continued on stabilizing the NFS server — we believe we’ve tracked down the stability issues to RAID controller problems. The compute nodes are becoming increasingly low on disk space, but we’ve tracked this down in a change in behavior of nova and have deployed a fix. nova-network was starting to experience timeouts due to excessive load, leading to instance creation failures. We’ve extended dhcp renewal times to reduce load. We upgraded wikitech.wikimedia.org and wiktech-static to the 1.22wmf11 version of MediaWiki. We also: deployed the AJAX-enabled delete instance feature; deployed a change to display more informative instance statuses; fixed issues in LdapAuthentication that broke blocking and renaming users; and deployed a change to allow service groups to be added to service groups, to make sharing code and data between tools easier.

Features Engineering

Editor retention: Editing tools

VisualEditor

In July, the VisualEditor team began switching the deployment from opt-in alpha to opt-out beta, so becoming the default editor for users of the various Wikipedias. The deployed version of the code was updated three times (1.22-wmf10, 1.22-wmf11 and 1.22-wmf12), with several mid-deployment releases as the code was developed to patch urgent issues. There were a number of user interface improvements, most notably to the references insertion dialog, alongside fixes to a number of bugs uncovered by the community.

Parsoid

In July, the Parsoid team supported the deployment of VisualEditor as default editor on eight Wikipedias, continuing to monitor bug reports, feedback pages, and village pump and fixed a number of bugs to eliminate instances of dirty diffs and other corruption that were reported. An absence of performance issues let us focus our attention on functionality and dirty-diff related bugs. This continued to be the primary focus of our work this month. On the staffing side, C. Scott Ananian joined the Parsoid team as a full-time employee — he has been working with us since earlier this year, first as a volunteer and then as a contractor. Marc Ordinas i Llopis from Spain and Arlo Breault from Canada joined the Parsoid team as contractors this month.

Editor engagement features

Flow

This month, we released two new prototypes to showcase some ideas around Flow-enabled user-to-user discussion. We continued to collect user feedback and prioritize use-cases for a potential minimum viable product.

Notifications

In July, we released our final features for Notifications on the English Wikipedia and mediawiki.org and meta.wikimedia.org. Benny Situ completed development of HTML Email notifications, as well as improved notifications, based on designs by Vibha Bamba. Fabrice Florin managed the release of these final features, and prepared this release plan to deploy Notifications on more wiki projects, starting with French and Polish Wikipedias in August. Dario Taraborelli and Matthias Mullie updated our new metrics dashboards, while Aaron Halfaker completed his report on our A/B test of new user activity. To learn more, visit the project portal, read the FAQ page and join the discussion on the talk page.

Article feedback

In July, we deployed a few last features and bug fixes for the Article Feedback Tool (AFT5) on the English and French Wikipedias. Matthias Mullie released the auto-archive feature, as well as this list of articles with feedback enabled on enwiki and on frwiki. At the request of the French Wikipedia community, he also developed new feedback notifications to let users know when feedback is marked as useful for a page they watch (or for a comment they posted). The team plans to make the AFT5 tool available to other wiki projects interested in testing this tool, provided that no new development is required to support their needs, as outlined in the release plan.

Editor engagement experiments

Editor engagement experiments

In July, the Editor Engagement Experiments (E3) team made progress on a number of continuing projects. In terms of features, the team also completed work to integrate the onboarding new Wikipedians project with new infrastructural changes and feature releases.For the GettingStarted, E3 collaborated with Platform engineering to ensure compatibility with the new “SUL2” cross-wiki authentication architecture. For the GuidedTour extension, the team completed a first release of support for guided tours of the VisualEditor interface, alongside tours of the legacy wikitext editor, and developed a plan to refactor the GuidedTour extension as well as its API. E3 also planned for its sixth A/B test of the GettingStarted workflow (see proposed specification and mockups). As an addition to the team’s redesign of account creation and login (launched in May-June), we enhanced the design of the form for users who fulfill account creation requests for others.E3 team member Matthew Flaschen also worked with two Google Summer of Code students on their projects. Richa Jain is working on the Annotator extension, which allows adding inline comments to a wiki page. Rahul Maliakkal is working on the Pronunciation Recording extension, for adding audio of pronunciations to Wiktionary.

On the experimental tools and data analysis front, E3 completed a significant rewrite of the Puppet configuration for EventLogging, our data collection pipeline, among other changes. For the MediaWiki-Vagrant portable desktop development environment, E3 added support for flexibly provisioning and unit testing extensions such as GettingStarted, GuidedTour, ParserFunctions, EventLogging, and others. Last but not least, the micro-survey of gender of new account registrations was enabled on German, French, Italian, and Polish Wikipedias, while data analysis on the English Wikipedia results began.

Support

2013 Wikimedia fundraiser

In July, the fundraising team did its first successful tests of our new payments gateway: Adyen. The (as yet) US-only Credit Card backup gateway performed similarly to our primary credit card processor in A/B testing, and can be successfully used as a failover. We also ran, for the first time, several short campaign tests targeted at mobile devices in the US. In these tests, users were able to choose between Paypal or Amazon Payments. Additional tests to determine peak times, appropriate localities, and optimum messaging for mobile campaigns will continue throughout August, as the campaigns are prepared.

Mobile

Wikipedia Zero

This month, the team launched Wikipedia Zero with Aircel in India, a carrier with about 60 million cellphone subscribers. We also completed our first cut of automation testing, started the implementation of the Wikipedia Zero software re-architecture, and patched bugs. During July we planned for the upcoming year in Wikipedia Zero. On the engineering front we are focusing first on test automation and re-architecture concurrent with SMS/USSD and J2ME releases, and afterward will be focusing efforts on end user UX and carrier-oriented enhancements that will support the continued growth of the program.

Mobile web projects

This month, the mobile web team released a new contributory nav to all Wikimedia mobile sites, including the existing upload and watchlist star features, as well as an edit button. This means that editing (in the form of section-level markup editing) is now enabled on all mobile Wikimedia sites for logged in users. In beta, we began work on mobile notifications restyling, as well as guiders for first-time editors and uploaders.

Platform Engineering

MediaWiki Core

MediaWiki 1.22

In July 2013, MediaWiki 1.22wmf10 through 1.22wmf13 were successfully deployed to Wikimedia project sites. We skipped the week of July 4th as there was reduced capacity in both engineering and operations due to the US holiday. We also named Markus Glaser and Mark Hershberger as the new contractors maintaining the MediaWiki “tarball” for release to other system administrators and organizations.

Git conversion

With the migration of pywikipediabot from Subversion to Git, we were able to switch svn.wikimedia.org to read-only mode, thus completing this migration. We plan to keep the Subversion service around indefinitely for archival purposes, and can still migrate any dormant project that hasn’t already been migrated on request.

Multimedia

In July, we continued to expand our multimedia team: Mark Holmquist joined as front-end software developer, working with product manager Fabrice Florin and engineering director Rob Lanphier, as well as contractors Brian Wolff and Jan Gerber. We prepared a first multimedia plan for the coming year and discussed our goals with community members in two separate events: a multimedia roundtable and an IRC chat. Based on community feedback, we identified five main areas of activity 2013-2014: improving the viewing experience and upload pipeline in the first half of the year, then focusing on file curation, discovery and placement in articles for the second half of the year. Our overall goals for this year are to increase both the number of contributions and files used in Wikipedia articles. For now, we have started work on a new media viewer to display images in larger size when you click on a thumbnail, as well as display file information and a full-screen viewing option, right on the same page. We plan to have a first version of that feature next month, and will be testing it as part of a beta experiment on a few pilot sites. We will also be hosting more community planning discussions, such as this multimedia roundtable at Wikimania 2013. To participate in these discussions and keep up with our work, we invite you to join this new multimedia mailing list. Last but not least, we are also recruiting for two more positions for our team: a multimedia systems engineer and a senior software engineer. Please spread the word about this unique opportunity to create a richer multimedia experience for Wikipedia and MediaWiki sites!

Admin tools development

This activity was on hiatus in August.

Search

Nik Everett and Chad Horohoe have continued writing an extension to implement ElasticSearch searching for MediaWiki, and we’ve finished most of the required features. Next comes getting it deployed, scaled, and fixing the inevitable bugs. We’re aiming to deploy to the test site beta.wmflabs.org before the end of the month. Peter Youngmeister and Asher Feldman will be handling the operations tasks for the new setup.

Auth systems

Engineers worked towards for an OAuth deployment to the beta cluster in early August, and aim to roll OAuth out to the test wikis (e.g., test2.wikipedia.org) after Wikimania.

HipHop deployment

HipHop work was mainly on hold in July, with the exception of some minor work on virtual machines.

Security auditing and response

The team continued to respond to reported security issues, and addressing outstanding bugs.

Quality assurance

Quality Assurance

This month QA made contributions to the VisualEditor, UniversalLanguageSelector and Mobile web projects, among others, finding and reporting issues in a timely manner. Our intern with the Outreach Program for Women is working on more automated browser tests. We continue to engage our community on the QA mail list and in live sessions, where we have several contributors (see Volunteer coordination and outreach).

Beta cluster

The Beta cluster continues to be a target for automated and manual testing. It also finally has a syslog receiver on deployment-bastion, thus solving bug 36748 (no syslog::server in beta). The logs can be accessed via either /home/wikipedia/syslog or /data/project/logs/syslog/ . This is thanks to Leslie Carr.

Browser testing

In July we added coverage for a number of features, including VisualEditor, UniversalLanguageSelector, and Mobile Search. We are making extensive use of beta labs as well as the test2wiki test environment. Our automated browser tests continue to identify important issues during feature development.

Analytics

We reviewed our planning document with the Sue and Erik and the Engineering Directors. Reception was positive and we will be communicating next steps more widely in August. The Analytics team focused on short term deliverables, reliability and hiring in July. We identified two potential candidates for front-end/Python work. We have been performing multiple phone screens together with Recruiting, and the hiring pipelines are good.

Analytics infrastructure

Kraken:

  • We kicked off a reliability project with Ops with the end goal of stabilizing Hadoop and the logging infrastructure. Teams have been in discussions on architecture and planning, and should have a path forward in the next 2 weeks. We identified a consultant who will perform a system audit to aid the project.
  • We continue adding new metrics and alerts to monitor all the different parts of the webrequest dataflows into Kraken. We expect to keep making improvements in the coming months until we have a fully reliable data pipeline into Kraken.

Logging Infrastructure:

  • We started this month with designing a canary event monitoring system. A canary event is an artificial event that is injected at the start of the data workflow and which we will monitor to see it reaches its final destination; that way we can ensure that the dataflows are functioning.
  • We are investigating what data format to use for sending the webrequest messages from Varnish to the Hadoop cluster. Formats that we are scrutinizing are JSON, Protobuf and AVRO, but we are also looking at compressions algorithms such as Snappy.

Analytics Visualization, Reporting & Applications

Wikimetrics: We successfully launched the initial version of Wikimetrics: see metrics.wmflabs.org. This version has support for cohort upload and two metrics: 1) bytes added and 2) namespace edits. We are working on adding support for time-series and aggregators. In the coming sprints we will focus on adding new metrics.

Wikipedia Zero: Dashboards have been moved off of Hadoop for the time being and are now being populated again. We have identified some issues with logrotation that are causing gaps in the graphs, and will look into these problems. Also, we have been working on technical handoff as Evan Rosen leaves the Foundation.

Limn: No development news.

Wikistats: No development news.

Data Releases

  • Erik Zachte published data and longitudinal analyses of edit and revert trends for Wikimedia projects (read the announcement). We provided data and ad-hoc analysis for the presentation A State of Decline? The State of Wikimedia Communities as of July 2013 at the July 2013 Monthly Metrics Meeting.
  • We published the analysis of a controlled experiment that we ran in June to test the Impact of notifications on new contributors and a pre-release A/B test of Visual Editor on the English Wikipedia. We performed an extensive audit of the quality of the data collected during and after the VE test, taking into account browser limitations and known bugs, and posted an update on the state of the analysis. We released via our open data repository the complete dataset of the sample of new registered users who participated in the split test to ensure the replicability of the analysis.
  • We released real-time dashboards on edit activity, new account registrations and reverts for the 10 Wikipedias on which VE has been rolled out. (endeesfrheitnlplrusv)

Engineering community team

Bug management

A PATCH_TO_REVIEW status was introduced in Bugzilla which is automatically set (by the Gerrit Notification Bot) on bug reports when a commit message in Gerrit mentions a corresponding bug number. Andre prepared a patch for using the InlineHistory extension in Bugzilla and a patch to make Bugzilla’s guided bug entry form for new users usable for Wikimedia Bugzilla. Andre also continued his weekly blogposts of Bugzilla tips. Thanks to Daniel Zahn, Bugzilla administrators now regularly receive an email with a database dump of Bugzilla’s “audit log” which lists the most recent taxonomy changes in Bugzilla (component or keyword additions, etc.). In Bugzilla’s taxonomy, the components in the “Parsoid” product were reorganized as requested by its main developer, and the remaining open “OggHandler” tickets were closed as it has been superseded by TimedMediaHandler.

Mentorship programs

Quim Gil organized meetings with each Google Summer of Code and Outreach Program for Women team, one by one. Most projects were already at full speed, and for them, the meeting was primarily social and nice to have. A few really benefited from going through a checklist to highlight early problems easy to solve now. All GSoC and OPW projects, 21 in total, are now on track.

Technical communications

Like in June, Guillaume Paumier was seconded to the VisualEditor deployment effort, working on communications, documentation and liaising with the French Wikipedia. Work on technical communications mostly focused on perennial activities like ongoing communications support to the engineering staff.

Volunteer coordination and outreach

On Community metrics, Quim Gil focused on the consolidation of korma.wmflabs.org, the new dashboard for automated community metrics. We have made good progress on this alpha, including basic metrics from Git, Bugzilla and mailing lists being retrieved on a daily basis, and have filed bugs and enhancement requests on GitHub (mediawiki-dashboard, VizGrimoireJS). We are deciding on the key metrics we need in order to make decisions, e.g. average time to resolve on Gerrit changesets or bug reports. We also planned and promoted a Browser Testing Automation workshop with Cucumber together with the QA team, with 13 people participating online. You can watch the session here (1h40). The experience was useful, as we agreed on MediaWiki-Vagrant as the default environment for automated testing and highlighted the list of easy bugs. Also, the Engineering Community team held its quarterly review.

Language engineering

The language team deployed Universal Language Selector (ULS) to most Wikimedia wikis to provide easier configuration options to readers and contributors. ULS provides a flexible way to configure and deliver language settings like interface language, fonts, and input methods (keyboard mappings). Also, ULS allows users to type text in different languages not directly supported by their keyboard, read content in a script for which fonts are not available locally, or customise the language in which menus are displayed. For more information, please see the FAQ.
The Language engineering team also mentored summer interns’ projects to improve language support on our sites, and asked for volunteer help translating the VisualEditor interface.

Kiwix

The Kiwix project is funded and executed by Wikimedia CH.

We are preparing the first release of a new Wikipedia ZIM creation solution for August. We also have achieved a new release of Kiwix for Android; this new version includes a few bug fixes and new features. Beside the release of traditional Wikipedia ZIM files, we have also published two interesting ZIM files: one which includes 2,500 ebooks (EPUB & PDF) of French literature and one with the new Wikipedia for Schools selection. The ZIM incremental update GSoC project progresses well too: first working versions of zimpatch & zimdiff console tools are available, and integration with Kiwix has started. Kiwix developers will be available at Wikimania, during the hacking days and at the WikimediaCH both during Wikimania itself.

Wikidata

The Wikidata project is funded and executed by Wikimedia Deutschland.

In July, we deployed Wikidata to all Wikivoyage sites in all languages, to manage their language links. We updated the continued Roadmap for Wikidata Development. Coveralls.io support has been added to most of our components. Since the first deployment of Phase1 to Wikipedia, about 240 million interwikilinks (5GB text) have been removed from articles (2012 vs 2013 analysis).
In other news, the AAAI Feigenbaum Prize for Watson was donated to the Wikimedia Foundation by IBM research to support work, especially on Wikidata.
Denny Vrandečić explains why Wikidata items are identified with a Q.

Future

The engineering management team continues to update the Deployments page weekly, providing up-to-date information on the upcoming deployments to Wikimedia sites, as well as the engineering roadmap, listing ongoing and future Wikimedia engineering efforts. Annual goals for the 2013–2014 fiscal year are being drafted by some teams and have been finalized by others.

This article was written collaboratively by Wikimedia engineers and managers. See revision history and associated status pages. A wiki version is also available.

3 Responses to “Wikimedia engineering July 2013 report”

  1. GORAN says:

    “Macedonia was greece provincia” This is very funy becouse Macedobnia was conquest ellenic people and ellada was a Macedonian province its a true

  2. Akshay says:

    Hello sir,
    I’m B.tech Computer Engg from India like to contribute the wikimedia foundation. Currently I’m in the last semester and interested to work for the wikimedia. Is there any way to join with you.?

Leave a Reply