Wikimedia engineering report, March 2014

Major news in March include:

  • an overview of webfonts, and the advantages and challenges of using them on Wikimedia sites;
  • a series of essays written by Google Code-in students who shared their impressions, frustrations and surprises as they discovered the Wikimedia and MediaWiki technical community;
  • Hovercards now available as a Beta feature on all Wikimedia wikis, allowing readers to see a short summary of an article just by hovering a link;
  • a subtle typography change across Wikimedia sites for better readability, consistency and accessibility;
  • a recap of the upgrade and migration of our bug tracking software.

Note: We’re also providing a shorter, simpler and translatable version of this report that does not assume specialized technical knowledge.

Engineering metrics in March:

  • 160 unique committers contributed patchsets of code to MediaWiki.
  • The total number of unresolved commits went from around 1450 to about 1315.
  • About 25 shell requests were processed.

Personnel

Work with us

Are you looking to work for Wikimedia? We have a lot of hiring coming up, and we really love talking to active community members about these roles.

Announcements

  • Chase Pettet joined the Wikimedia Operations Team as as Operations Engineer (announcement).
  • Following changes in the Engineering Community Team, Quim Gil took over as Engineering Community Manager, and Sumana Harihareswara transitioned to the role of Senior Technical Writer (announcement).
  • Kevin LeDuc joined the Wikimedia Foundation as Analytics Product Manager (announcement).

Technical Operations

Datacenter RFP

Final negotiations and coordination are still ongoing for the data center RFP, but we expect to be able to make an announcement soon.

Wikimedia Labs

Labs metrics in March:

  • Number of projects: 149
  • Number of instances: 310
  • Amount of RAM in use (in MBs): 1,288,704
  • Amount of allocated storage (in GBs): 14,925
  • Number of virtual CPUs in use: 635
  • Number of users: 2,907
The Labs Ops team has spent the month shepherding projects from the Tampa cloud to the Ashburn cloud. Dozens of volunteers contributed to the move, and all tools and projects have now been copied to or rebuilt in Ashburn. Some projects and tools are in a non-running state pending action on the part of their owners or admins. Ashburn Labs is running OpenStack Havana, with NFS for shared storage.
The usage stats this month are quite a bit different from last month. Quite a number of obsolete instances have been purged, and last month’s stats may have included some data center duplication.

Tampa data center

During March, the Ops team has been decommissioning and shutting down a lot of hosts in the old Tampa data center, including all former appservers. The amount of energy consumed in the old data center has been greatly reduced. A few hosts are going to be migrated to another floor in the existing data center and physical data center work is coming up.

Features Engineering

Editor retention: Editing tools

VisualEditor

Presentation slides from the VisualEditor team’s quarterly review meeting on 26 March.

In March, the VisualEditor team continued their work on improving the stability and performance of the system, and added some new features and simplifications, helping users edit and create pages more swiftly and easily. Editing templates is now much simpler, moving most of the advanced controls that users don’t often need into a special version of that dialog. The media dialog was improved and stream-lined a little, adding some hinting to the controls to explain how they work a bit more. The cursor entry points inserted by VisualEditor next to items like images or templates to give users somewhere to put the cursor now animate on hover and cursor entry to show that they’re special. The overall design of dialogs and controls was improved a little to make it flow better, like double-clicking a block to open its dialog. A new system for quickly and simply inserting and editing “citations” (references based on templates) neared completion and will be deployed in the coming month. The deployed version of the code was updated four times in the regular releases (1.23-wmf17, 1.23-wmf18, 1.23-wmf19 and 1.23-wmf20).

Parsoid

Presentation slides from the Parsoid team’s quarterly review meeting on March 28

March saw the Parsoid team continuing with a lot of unglamorous bug fixing and tweaking. Media / image handling in particular received a good amount of love, and is now in a much better state than it used to be. In the process, we discovered a lot of edge cases and inconsistent behavior in the PHP parser, and fixed some of those issues there as well.

We wrapped up our mentorship for Be Birchall and Maria Pecana in the Outreach Program for Women. We revamped our round-trip test server interface and fixed some diffing issues in the round-trip test system. Maria wrote a generic logging backend that lets us dynamically map an event stream to any number of logging sinks. A huge step up from our console.error based basic error logging so far.

We also designed and implemented a HTML templating library which combines the correctness and security support of a DOM-based solution with the performance of string-based templating. This is implemented as a compiler from KnockoutJS-compatible HTML syntax to a JSON intermediate representation, and a small and very fast runtime for the JSON representation. The runtime is now also being ported to PHP in order to gauge the performance there as well. It will also be a test bed for further forays into HTML templating for translation messages and eventually wiki content.

Core Features

Flow

This month the Core Features team focused on improvements to how Flow works with key MediaWiki tools and processes. We made changes to the history, watchlist, and recent changes views, adding more context and bringing them more in line with what experienced users expect from these features. We also worked on improvements to the API and links tables integration. On the core discussion side, we released a Flow thank feature, allowing users to thank each other for posts, and began work on a feature to close and summarize discussions. Lastly, we continued work on rewriting the Flow front-end to make it cleaner, faster, and more responsive across a wide number of browsers/devices, which will be ongoing over the next month.

Growth

Growth

In March, the Growth team primarily focused on bug fixing, design enhancements, and refactoring of the GettingStarted and GuidedTour extensions, which were recently launched on 30 Wikipedias. We updated icons and button styles, rewrote the interface copy, and refactored the interface to be more usable in non-English languages. We also began work on a significant refactor of the GuidedTour API, in order to support interactive tours that are non-linear. Non-linear tours will not depend on a page load to run, which will enable better support for tours in VisualEditor, among other things. Last but not least, we made progress on measuring the impact of GettingStarted across all wikis where it is deployed, with results for the first 30 days of editor activity expected in early April.

Support

Wikipedia Education Program

This month, thanks to the work of Facebook Open Academy student JJ Liu, we added a new type of notification for course pages: users are now notified whenever they get added to a course. We also fixed inconsistencies with interface messages, user rights, and the deletion of institutions from the system.

Mobile

Wikimedia Apps

The team worked on logged-out editing to logged-in editing, and table of contents refinements.

Mobile web projects

The team worked on the link inspector for VisualEditor on tablets, and a switch between VisualEditor and wikitext on tablets. Both are in alpha.

Wikipedia Zero

During the last month, with the assistance of the Ops and Platform teams, the Wikipedia Zero team added hosting for the forthcoming Partners Portal and continued work on image size reduction for the mobile web. Additionally, the team added Wikipedia Zero detection to the Wikipedia for Firefox OS app and added contributory features support for users on partner networks supporting zero-rated HTTPS connections. The team also removed search thumbnails for zero.wikipedia.org connections in order to avoid spurious charges on devices supporting high-end JavaScript yet using zero.wikipedia.org. Analytics fields were added for the purpose of counting proxy-based and HTTPS-based connections. Routine pre- and post-launch configuration changes were made to support operator zero-rating, and technical assistance was provided to operators and the partner management team to help add zero-rating. The Wikipedia Zero automation testing server was also migrated. The forthcoming Android and iOS apps were also updated to make Wikipedia Zero detection a standard fixture.

Yuri continued analytics work on SMS/USSD pilot data. Post hoc analysis was performed on WML usage after its deprecation; it is still low, although obtaining more low-end phones to check for how well HTML renders and how to enhance the HTML could be useful. Post hoc analysis was also performed on anomalous declines and growth spurts in log lines (not strictly related to pageviews); in the former it much had to do with API changes and in the latter it had much to do with an external polling mechanisms.

With the assistance of the Apps team, User-Agent, Send App Feedback, and Random features were added to the forthcoming reboots of the Android and iOS apps, while making the Share feature for Android allow for a different target app each time and providing code review assistance on the Android and iOS apps code; proof of concept for fulltext search was started on iOS. Wikipedia for Firefox OS bugfixes were also pushed to production. Screencap workflows and preload information was put together for the Android reboot with respect to Wikipedia Zero as well.

The team worked with Ops on forward planning in light of the extremely infrastructure-oriented nature of the program. Quarterly review as held with the ED, VP of Engineering, and the W0 cross-functional team, and the W0 cross-functional team reviewed presentation material for publication. The team also continued work on additional proxy and gateway support. To help partner tech contacts, the team worked on reformatting the tech partner introductory documentation.

Finally, the team explored proactive MCC/MNC-to-IP address drift correction, and will be emailing the community for input soon.

Wikipedia Zero (partnerships)

Smart, the largest mobile operator in the Philippines, is giving access to Wikipedia free of data charges through the end of April. They announced the promotion in a press release. Ingrid Flores, Wikipedia Zero Partner Manager, visited the Philippines and arranged a meeting with local community members and Smart. They are now exploring ways to collaborate in support of education. The partnerships team kicked off account reviews with the 27 existing Wikipedia Zero partners, to update the implementation, identify opportunities for collaboration in corporate social responsibility (CSR) initiatives and get feedback on the program. The account reviews will continue for the next few months. Last, we continued recruiting for Wikipedia Partner Manager for the Asia region.

Language Engineering

Language tools

MediaWiki’s LocalisationUpdate extension was rewritten by Niklas Laxström to modernize its internal architecture to be able to support JSON message file formats. Kartik Mistry released the team’s monthly MediaWiki Language Extension Bundle (MLEB 2014.3) with the latest version of LocalisationUpdate (see release notes). Niklas Laxström also started migrating the Translate extension’s translation memory and translation search back-end from Solr to ElasticSearch in line with Wikimedia’s search migration. David Chan continued his work on input method support for the VisualEditor project.

Milkshake

Santhosh Thottingal, Kartik Mistry and Niklas Laxström made numerous bugs and performance improvements in jquery.webfonts, jquery.ime and jquery.uls. Amir Aharoni started collecting metrics on usage of Universal Language Selector.

Language Engineering Communications and Outreach

Runa Bhattacharjee and Kartik Mistry set up a manual testing infrastructure using the Test Case Management System (TCMS) to help get greater participation from the volunteer community of software tools and features developed by the team. Volunteer testing is expected to be kickstarted for language software this coming month. The team’s monthly office hour was hosted by Runa Bhattacharjee on March 12. An overview of webfonts with advantages and challenges of using them on Wikimedia sites was also published by the team.

Content translation

Preview of section alignment and basic editing in Content translation

Santhosh Thottingal and David Chan continued development and technology research on the Content Translation project. Development was focused specifically on updates to the side-by-side translation editor and section alignment of translated text. Kartik Mistry and Santhosh Thottingal worked on infrastructure for testing the Content Translation server. David Chan continued his technology research on sentence segmentation.

Pau Giner updated the Content Translation UI design specification incorporating review comments from UX and product reviews. The team also participated in a review of the Content Translation project with the product team leadership.

Platform Engineering

MediaWiki Core

HHVM

The team continued to work on porting C extensions to HHVM. Tim Starling did major work on a compatibility layer allowing Zend extensions to be used by HHVM, and started further work on making the layer compatible with newer HHVM interfaces. The team has made a preliminary deployment of HHVM to the Beta cluster, but this still needs further debugging before it is useful to a wider audience.

Release & QA

The Beta Cluster has been migrated from the Tampa data center to the Ashburn data center. In the move, a ton of cleanup and Puppetization work was done. This will make future Beta Cluster work easier. In addition, the Beta Cluster is getting closer to a place where we can test our current main deployment tool known as “scap” along with future/other deployment tools. The team continued on the rewrite of scap into python (from Bash scripts + PHP), improving both performance and maintainability in addition to being in a better position to move to a new tool in the future. We have also started doing SWAT deploy windows twice a day (Monday to Thursday) which has greatly increased momentum for many developers who would otherwise have to wait until the weekly deployment cycle.

Search

In March we upgraded to the newest version of Elasticsearch and expanded onto more wikis. We also started a performance assessment which has started showing us the work required to use Cirrus as the primary search back-end for the larger wikis. We then started in on that work.

Auth systems

The team prepared the migration of the central OAuth database from mediawiki.org to Meta-Wiki, and got input from the Wikimedia Foundation’s legal team regarding the OAuth process.

Wikimania Scholarships app

Support of production application during applicant review period continued in March. A dataset of applicants passing the phase 1 review criteria who had opted-in to sharing application details with chapters and thematic organizations was prepared and delivered to Foundation staff. The beta testing server was migrated from the Tampa data center to the Ashburn data center as a component of the Labs environment migration. The new beta server in Labs is now managed via the MediaWiki-Vagrant role::wikimania_scholarships puppet role and labs-vagrant. This should make keeping development changes and the testing application in sync easier in the future.

Security auditing and response

MediaWiki 1.19.13, 1.22.5, 1.21.8 and 1.19.14 were released for security issues. An internal security training session was held for Wikimedia Foundation staff.

Quality assurance

Quality Assurance

The QA team continues to identify and report issues in a timely way. Of particular interest in March was that an automated test uncovered an issue in the interaction of the MobileFrontend and VisualEditor extensions. This is exactly the kind of cross-cutting concern that our QA systems are designed to uncover. It is likely that we will be in a position to discuss these systems at the Wikimania conference in London.

Beta cluster

There was a substantial effort to migrate the Beta cluster over from the Tampa data center to the Ashburn, VA (“eqiad”) data center. This was led by Antoine Musso with assistance from Bryan Davis and many others.

Continuous integration

Erik Bernhardson and Chad Horohoe managed to create jobs using Jenkins Job Builder (JJB) based on the tutorials on installing JJB and Adding a MediaWiki extension.

Browser testing

Besides a particular focus on MobileFrontend browser tests in March, we have also made available some new features, in particular shared code to upload files properly in all browsers, the ability to check for ResourceLoader problems in any test in any repository, and a basic wrapper in order to use the Mediawiki API from within browser tests to set up and tear down test data.

Multimedia

Multimedia

Slides for the Multimedia Quarterly Review Meeting for Q3 2013-14.

In March, the multimedia team’s main project was Media Viewer v0.2, as we completed final features for the tool’s upcoming release next quarter. Gilles Dubuc, Mark Holmquist, Gergő Tisza and Aaron Arcos developed a number of new features, including: share, embed, download, opt-out preference,file page link and feedback link, based on designs by Pau Giner. We invite you to test the latest version (see the testing tips) and share your feedback.

Fabrice Florin coached the multimedia team as product manager and hosted several planning and review meetings, including a cycle planning meeting (leading to the next cycle plan) and the Multimedia Quarterly Review Meeting for the first quarter of 2014, which summarizes our progress and next steps for coming work (see slides). He also worked with Keegan Peterzell to engage community members for the gradual release of Media Viewer, to be enabled by default on a number of pilot sites next month, then deployed widely to all wikis a few weeks later. For more updates about our multimedia work, we invite you to join the multimedia mailing list.

Engineering Community Team

Bug management

Beside working on the Project Management Tools Review, Andre Klapper retriaged many older tickets with high priority set for >2years, older PATCH_TO_REVIEW tickets and older critical tickets and investigated moving the Bugzilla instance on Wikimedia Labs to the Ashburn data center (easier to set up from scratch). Andre added project-specific sections and Bugzilla queries to Annoying little bugs to help newcomers finding an area of interest for contributing, and blogged about the 4.4 upgrade (which took place in February) and moving Bugzilla to a new server. In Bugzilla’s tickets, all remaining Cortado tickets were closed and new Versions for “Wikipedia App” product set up.

Project management tools review

Guillaume Paumier and Andre Klapper reached to the teampractices and wikitech-l mailing lists in order to shorten the list of options that can come out of this review process. They also hosted a lively IRC office hour to give an overview of the current situation, answer questions and discuss the first version of the related RFC.

Mentorship programs

The six ongoing FOSS Outreach Program for Women were completed successfully, setting a new benchmark for success in our outreach programs. Check the results:

We received 43 Google Summer of Code proposals from 42 candidates, and 18 FOSS Outreach Program for Women proposals from 18 candidates. Dozens of mentors are pushing the selection process that will conclude on April 21 with the announcement of selected participants.

Technical communications

In addition to ongoing communications support for the engineering staff, and contributing to the technical newsletter, Guillaume Paumier edited and published a series of essays on the Wikimedia Tech blog written by Google Code-in students, who shared their impressions, frustrations and surprises as they discovered the Wikimedia and MediaWiki technical community.

Volunteer coordination and outreach

The bulk of work to create community metrics around five Key Progress Indicators is completed, and now we are polishing help strings and usability details. The next step is to share the news with the community and start looking at bottlenecks and actions. Check:

A page about Upstream projects was drafted collaboratively in order to start mapping the key communities where we Wikimedia should be active, either as contributor / stakeholder, or promoting our own tools. We helped selecting participants sponsored to travel to the Zürich Hackathon 2014 in May.

Architecture and Requests for comment process

We held four RfC review meetings on IRC:

Analytics

Kraken

We reached a milestone in our ability to deploy Java applications at the Foundation this month when we stood up an Archiva build artifact repository. This enables us to consistently deploy Java libraries and applications and will be used in Hadoop and Search initially.

The first Analytics use case for this system will be Camus, Linked-In’s open source application for loading Kafka data into Hadoop. Once this is productized, we’ll have the ability to regularly load log data from our servers into Hadoop for processing and analysis.

Wikimetrics

We did some significant architectural work on WikiMetrics this month to prepare it for its role as our recurrent report scheduling and generation system. The first use case for this system will be the Editor Engagement Vital Signs project, which will provide daily updates on key metrics around participation.

Kafka

We continue to investigate network issues between our data centers that are causing occasionally delivery issues. As noted above, we are currently deploying Camus, our software for transferring data between Kafka and Hadoop.

Data Quality

We fixed a number of issues around data quality in Wikistats, Wikipedia Zero and Wikimetrics.

Research and Data

Video of the March session of the Research and Data monthly showcase.

This month we concluded the first stage of work on metrics standardization. We created an overview of the project with a timeline and a list of milestones and deliverables. We also gave an update on metrics standardization during the March session of the Research and Data monthly showcase. The showcase also hosted a presentation by Aaron Halfaker on his research on the impact of quality control mechanisms on the growth of Wikipedia.

We published an extensive report from a session we hosted at CSCW ’14 on Wikipedia research, discussing with academic researchers and students how to work with researchers at the Foundation.

We submitted 8 session proposals for Wikimania ’14, authored or co-authored by members of the research team.

We attended the Analytics team’s Q3 quarterly review during which we presented the work performed by the team in the past quarter and our goals for the upcoming quarter (April-June 2014).

We completed the handover of Fundraising analytics tools and knowledge transfer in preparation for a new full-time research position that we will be opening shortly to support the Fundraising team.

We continued to provide support to teams in focus area (Growth and Mobile) with an analysis of the impact of the rollout of the new onboarding workflows across multiple wikis; an analysis of mobile browsing sessions and ongoing analysis of mobile user acquisition tests. We also supported the Ops team in measuring the impact of the deployment of the ULSFO cluster, which provides caching for West USA and East Asia.

Kiwix

The Kiwix project is funded and executed by Wikimedia CH.

This month, we released a new version of Kiwix for Android that adds support for older versions of Android like Gingerbread; about 50% more devices than before are now supported.

Wikidata

The Wikidata project is funded and executed by Wikimedia Deutschland.

The team worked on making ranks more useful. From now on, by default the property parser function and Lua always return the values with the “preferred” rank or, when none is available, the one with the “normal” rank. This allows for example to exclude past mayors when asking Wikidata for the mayor of a city. Additionally, considerable speed improvements have been made; browsing Wikidata is now a lot faster. Diffs between versions of pages on Wikidata have also been improved to make it easier to see what changes were made to an item. Last but not least, the user interface redesign research went on.

Future

The engineering management team continues to update the Deployments page weekly, providing up-to-date information on the upcoming deployments to Wikimedia sites, as well as the annual goals, listing ongoing and future Wikimedia engineering efforts.

This article was written collaboratively by Wikimedia engineers and managers. See revision history and associated status pages. A wiki version is also available.

0 Show

0 Comments on Wikimedia engineering report, March 2014

Leave a Reply

Your email address will not be published. Required fields are marked *