Wikimedia engineering report, August 2014

Translate This Post

Major news in August includes:

Engineering metrics in August:

  • 160 unique committers contributed patchsets of code to MediaWiki.
  • The total number of unresolved commits went from around 1640 to about 1695.
  • About 22 shell requests were processed.

Technical Operations

Dallas data center

On August 21, our first connectivity to the new Dallas data center (codfw) came online, connecting the new site to the Wikimedia network. The following week, all network equipment was configured to prepare for server installations. The first essential infrastructure services (install server, DNS, monitoring etc.) were brought online in the days following August 25, and we are now working on deploying the first storage & data base servers to start replication & backups from our other data centers.

Labs metrics in August:

  • Number of projects: 170
  • Number of instances: 480
  • Amount of RAM in use (in MBs): 2,116,096
  • Amount of allocated storage (in GBs): 22,600
  • Number of virtual CPUs in use: 1,038
  • Number of users: 3,718

Wikimedia Labs

Andrew fixed a few sudo policy UI bugs (68834, 61129). Marc improved the DNS cache settings and resolved some long-standing DNS instability (70076). He also set up a new storage server for wiki dumps. This should resolve some long-term storage space problems that led to out-of-date dumps.
Andrew laid the groundwork for wikitech to be updated via the standard WMF deployment system. We’re investigating the upstream OpenStack user interface, ‘horizon’.

Features Engineering

Editor retention: Editing tools

VisualEditor

In August, the team working on VisualEditor presented about VisualEditor at Wikimania 2014, worked with a number of volunteers at the hackathon, adjusted key workflows for template and citation editing, made major progress on Internet Explorer support, and fixed over 40 bugs and tickets.

Users of Internet Explorer 11, who we were previously preventing from using VisualEditor due to some major bugs, will now be able to use VisualEditor. Support for earlier versions of Internet Explorer will be coming shortly. Similarly, tablet users browsing the site’s mobile mode now have the option of using a mobile-specific form of VisualEditor. More editing tools, and availability of VisualEditor on phones, is planned for the future.

Improvements and updates were made to a number of interface messages as part of our work with translators to improve the software for all users, and VisualEditor and MediaWiki were improved to support highlighting links to disambiguation pages where a wiki or user wishes to do so. Several performance improvements were made, especially to the system around re-using references and reference lists. We tweaked the link editor’s behaviour based on feedback from users and user testing. The deployed version of the code was updated three times in the regular release cycle (1.24-wmf17, 1.24-wmf18 and 1.24-wmf19).

Editing

In August, the Editing Team presented at Wikimania 2014 on better ways to develop and manage front-end software, improved the infrastructure of the key user interface libraries, and continued the planned adjustments to the MediaWiki skins system.

The TemplateData GUI editor was significantly improved, including being updated to use the new types, and recursive importing of parameters if needed, and deployed on Norwegian Bokmål Wikipedia. The volunteers working on the Math extension (for formulæ) moved closer to deploying the “Mathoid” server that will use MathJax to render clearer formulæ than with the current versions.

The Editing team as usual did a lot of work on improving libraries and infrastructure. The OOjs UI library was modified to make the isolation of dialogs using <iframe>s optional, and re-organise the theme system as part of implementing a new look-and-feel for OOUI, to make it consistent with the planned changes to the MediaWiki design, in collaboration with the Design team. The OOjs library was updated to fix a minor bug, with two new versions (v1.0.12 and then v1.1.0) released and pushed downstream into MediaWiki, VisualEditor and OOjs UI.

Parsoid

In August, we wrapped up our face-to-face off-site meetup in Mallorca and attended Wikimania in London, which was the first Wikimania event for us all. At the Wikimania hackathon, we co-presented (with the Services team) a workshop session about Parsoid and how to use it. We also had a talk at Wikimania about Parsoid.

The GSoC 2014 LintTrap project wrapped up and we hope to develop this further over the coming months, and go live with it later this year.

With an eye towards supporting Parsoid-driven page views, the Parsoid team worked on a few different tracks. We deployed the visual diff mass testing service, we added Tidy support to parser tests and updated tests, which now makes it easy for Parsoid to target the PHP Parser + Tidy combo found in production, and continued to make CSS and other fixes.

Services

Services and REST API

August was mostly a month of travel and vacation for the service team. We deployed a first prototype of the RESTBase storage and API service in Labs. We also presented on both Parsoid and RESTBase at Wikimania, which was well received. Later in August, computer science student Hardik Juneja joined the team as a part-time contractor. Working from Mumbai, he dived straight into complex secondary index update algorithms in the Cassandra back-end. At the end of the month, design work resumed, with the goal of making RESTBase easier to extend with additional entry points and bucket types.

Core Features

Flow

In August, the Flow team created a new read/unread state for Flow notifications, to help users keep track of the active discussion topics that they’re subscribed to. There are now two tabs in the Echo notification dropdown, split between Messages (Flow notifications) and Alerts (all of the other Echo notifications). Flow notifications stay unread until the user clicks on the item and visits the topic page, or marks the item as read in the notifications panel. The dropdown is also scrollable now, and holds the 25 most recent notifications. Last, subscribing to a Flow board gives the user a notification when a new topic is created on the board.

Growth

Growth

In August, the Growth team vetted CirrusSearch as back-end for personalized suggestions and prepared its first A/B test of the new task recommendations system. This test will deliver recommendations to a random sample of newly-registered users on 12 Wikipedias: English, French, German, Spanish, Italian, Hebrew, Persian, Russian, Ukrainian, Swedish, and Chinese. Several Growth team members also attended Wikimania 2014 in London. At Wikimania, the team shared presentations on its work and conducted usability tests of the recommendations system. Last but not least, design work began on the third major iteration of the team’s anonymous editor acquisition project.

Mobile

Wikimedia Apps

In August, the Mobile Apps Team focussed on bug fixes for the recently released iOS app and for the Android app, as well as gathering user feedback from Wikimania. The team also had unstructured time during Wikimania, in which the engineers are free to work on whatever they fancy. This resulted in numerous code quality improvements on both iOS and Android. On iOS, the unstructured time also spawned a preliminary version of the feature “Nearby”, which lists articles about things that are near you, tells you how near they are to you, and points towards them. On Android, the unstructured time spawned a preliminary version of full text search, an improved searching experience which aims to present more relevant results.

Mobile web projects

This month the mobile web team, in partnership with the Editing team, launched a mobile-friendly opt-in VisualEditor for users of the mobile site on tablets. Tablet users can now choose to switch from the default editing experience (wikitext editor) to a lightweight version of VE featuring some common formatting tools (bold and italic text, the ability to add/edit links and references). We also began building a Wikidata contribution game in alpha that will allow users to add metadata to the Wikidata database (to start, occupations of people) directly from the Wikipedia article where the information is contained. We hope to graduate this feature to the beta site next month to get more quantitative feedback on its usage and the quality of contributions.

Wikipedia Zero & Partnerships

Wikipedia Zero page views held steady at around 70 million in August. We launched Wikipedia Zero with three operators: Smart and Sun in the Philippines (related companies) and Timor Telecom in East Timor. That brings our total numbers to 37 partners in 31 countries. Smart has been collaborating with Wikimedia Philippines for months, and they previously offered free access to Wikipedia on a trial basis. Just announced, Smart has now officially joined Wikipedia Zero and brought in their sister brand Sun, covering a combined 70 million subscribers in the Philippines. Timor Telecom launched Wikipedia Zero with a press event including the Vice Minister of Education and much promotion. Timor Telecom is keen to support growth in the Tetun Wikipedia by raising awareness in universities, with resources from the Wikipedia Education Program. In Latin America, we made progress toward app preloads by completing testing for the Qualcomm Reference Design (QRD) program. The Wikipedia Android app is now certified for preload on QRD. We made terrific connections with Global South community members at Wikimania, which will lead to more direct local collaboration between partners and Wikimedia communities. Smriti Gupta, partnerships manager for Asia, moved to India where she will work remotely. We’re recruiting our third partnerships manager to cover South East Asia and tech partnerships.

Language Engineering

Language tools

Niklas Laxström (outside his WMF job) completed most of the work needed in Translate to Recover gracefully from session expiration, a known pain point for translators. The PageMigration feature (a GSoC project mentored by Niklas) was (GSoC project mentored by Niklas) released . The team also worked on session expiry checking (to prevent errors in long translations), updated YAML handling, deployed auto-translated screenshots for the VisualEditor user guide (a GSoC project mentored by Amir and done by Vikas Yaligar). They did internationalization testing of the new Android and iOS apps, as well as internationalization testing and bug fixes in VisualEditor, MobileFrontend and Flow.

Milkshake

Webfonts were enabled on the English Wikisource and Divehi wikis, following requests from the respective communities.

Language Engineering Communications and Outreach

The team was at Wikimania in London. Santhosh Thottingal and Amir Aharoni presented on Machine-aided machine translation, and Runa Bhattacharjee and Kartik Mistry on Testing multilingual applications. They conducted user testing for ContentTranslation in several languages (Catalan, Spanish, Kazakh, Russian, Bengali, Hebrew, Arabic), continued conversations with translators from Wikipedias in several languages, and published a retrospective on ContentTranslation and Wikimania.

Content translation

achine translation abuse algorithm was redone. The team also worked on reference adaptation improvements, refactoring the front-end event architecture and rewriting the cxserver registry to support multiple machine translation engines.

Platform Engineering

MediaWiki Core

HHVM

We migrated test.wikipedia.org to HHVM in early August and saw very few issues. Giuseppe shared some promising benchmarks. Re-imaging an app server was surprisingly painful, in that Giuseppe and Ori had to perform a number of manual actions to get the server up-and-running, and this sequence of steps was poorly automated. Doing this much manual work per app server isn’t viable.

Mark submitted a series of patches to create a service IP and Varnish back-end for an HHVM app server pool, with Giuseppe and Brandon providing feedback and support. The patch routes requests tagged with a specific cookie to the HHVM back-ends. Tech-savvy editors were invited to opt-in to help with testing by setting the cookie explicitly. The next step after that will be to divert a fraction of general site traffic to those back-ends. The exact date will depend on how many bugs the next round of testing uncovers.

Tim is looking at modifying the profiling feature of LuaSandbox to work with HHVM; it is currently disabled.

Admin tools development

Most admin tools resources are currently directed towards SUL finalisation. There was a roundtable at Wikimania with developers and admins/tool users discussing some issues they’ve had, and feature requests they would like to see implemented. The GlobalCssJs extension was deployed to all public Wikimedia wikis, allowing for proper user global CSS and JS.

Search

tarted deploying Cirrus as the primary search back-end to more of the remaining wikis and we found what looks like our biggest open performance bottleneck. Next month’s goal is to fix it and deploy to more wikis (probably not all). We’re also working on getting more hardware.

SUL finalisation

The SUL finalisation team continues to work on building tools to support the finalisation. There are four ongoing streams of work, and the team is on track to have the majority of the work completed by the end of September.

The ability to globally rename users was deployed a while ago, and is currently working excellently!

The ability to log in with old, pre-finalisation credentials has been developed so that users are not inadvertently locked out of their accounts. From an engineering standpoint, this form is now fully working in our test environment. Right now, the form uses placeholder text; that text needs to be ‘prettified’ so that the users who have been forcibly renamed get the appropriate information on how to proceed after their rename, and more rigorous testing should be done before deployment.

A form to globally merge users has been developed so that users can consolidate their accounts after the finalisation. From an engineering standpoint, this form is now fully working in our test environment. The form needs design improvements and further testing before it can be deployed.

A form to request a rename has been developed so that users who do not have global accounts can request a rename, and also so that the workload on the renamers is reduced. From an engineering standpoint, the form to request a rename has been implemented, and implementation has begun on the form that allows renames to rename users. Once the end-to-end experience has been fully implemented and tested, the form will be ‘prettified’.

Security auditing and response

ecurity reviews of the Graph, WikibaseQuery and WikibaseQueryEngine extensions. Initial work was done to enable regular dynamic security scanning.

Release Engineering

Quality Assurance

Having completed the migration of our Continuous Integration infrastructure from a third party host to Wikimedia’s own Jenkins instance, we are thinking about improvements and changes for future work. We aim to improve performance for Jenkins and also for beta labs. We are looking into creating other shared test environments along with beta labs to better support changes like we did this month with HHVM and with a security and performance test project. We also continue to improve the development experience with Vagrant and other virtual machine technologies.

Browser testing

This month, we continued to build out and adjust the new browser test builds on Jenkins. We saw updates to tests and issues identified for UploadWizard, VisualEditor, Echo, and MobileFrontend. New tests for GettingStarted pointed out a need to update our Redis storage on the beta cluster. We are currently monitoring an upstream problem with Selenium/Webdriver and IE11 on behalf of VisualEditor, as VE support for IE11 is coming soon.

Multimedia

Multimedia

Media Viewer’s new ‘minimal design’.

In August, the multimedia team had extensive discussions with community members about the various projects we are working on. We started with seven different roundtable discussions and presentations at Wikimania 2014 in London, including sessions on: Upload Wizard, Structured Data, Media Viewer, Multimedia, Community and Kindness. To address issues raised in recent Requests for Comments, we also hosted a one-week Media Viewer Consultation, inviting suggestions from community members across our sites.

The team also worked to make Media Viewer easier to use by readers and casual editors, our primary target users for this tool. To that end, we created a new ‘minimal design’ including a number of new improvements such as a more prominent button linking to the File: page, an easier way to enlarge images and more informative captions. These new features were prototyped and carefully tested this month to validate their effectiveness. Testers completed easily most of tasks we gave them, suggesting that the new features are now usable by target users, and ready for development in September.

This month, we prepared a first plan for the Structured Data project, in collaboration with many community members and the Wikidata team: we propose to gradually implement machine-readable data on Wikimedia Commons, starting with small experiments in the fall, followed by a wider deployment in 2015. We also continued our code refactoring for the UploadWizard, as well as fixed more bugs across our multimedia platform. To keep up with our work, join the multimedia mailing list.

Engineering Community Team

Bug management

Daniel made Bugzilla use ssl_ciphersuite to add HSTS and removed a superfluous STS header setting. Andre worked around a Bugzilla XML RPC API issue which created problems for exporting Bugzilla data for a Phabricator import. In Bugzilla’s taxonomy (components, descriptions, default CCs, etc.) some smaller changes took place.

Phabricator migration

The project is getting close to Day 1 of a Wikimedia Phabricator production instance. For better overview and tracking, the Wikimedia Phabricator Day 1 project was split into three projects: Day 1 of a Phabricator Production instance in use, Bugzilla migration, and RT migration. Furthermore, the overall schedule was clarified. In the last month, Security/permission related requirements got implemented (granular file permissions and upload defaults, enforcing that policy, making file data inaccessible and not only undiscoverable). In upstream, Mukunda added API to create projects and Chase added support for mailing lists as watching users. Chase worked on and tested the security and data migration logic. Mukunda continued to work on getting the MediaWiki OAuth provider merged into upstream. Chase and Mukunda also worked on the Project Policy Enforcer action for Herald, providing a user-friendly dropdown menu to restrict ticket access when creating the ticket. A separate domain for user content was purchased. Chase also worked on the scripts to export and import data between the systems and support for external users in Phabricator and the related mail setup. Chase and Chad also took a look at setting up Elasticsearch for Phabricator.

Mentorship programs

All Google Summer of Code and FOSS Outreach Program for Women were evaluated by their mentors as PASSED, although many were still waiting for completion, code reviews and merges. We hosted a wrap-up IRC meeting with the participation of all teams except one. We are still waiting for some final reports from the interns. In the meantime, you can check their weekly reports:

Technical communications

In August, Guillaume Paumier attended the Wikimania conference and the associated hackathon. He gave a talk about Tech News (video available on YouTube) and created a poster summarizing the talk. He also continued to write and distribute Tech News every week, and started to contribute to the Structured data project.

Volunteer coordination and outreach

We ran the Wikimania Hackathon in an unconference manner together with the Wikimania organizers. The event went well in a unique venue, and we are compiling a list of lessons learned to be applied in future events. Together with other former organizers of hackathons, we decided that the next Wikimedia Hackathon in Europe will be organized by Wikimedia France (details coming soon). Also at Wikimania, Quim Gil gave a talk about The Wikimedia Open Source Project and You (videoslides).

Analytics

Wikimetrics

Following the prototype built for Wikimania, the team identified many performance issues in Wikimetrics for backfilling Editor Engagement Vital Signs (EEVS) data. The team spent a sprint implementing some performance enhancements as well as properly managing sessions with the databases. Wikimetrics is better at running recurring reports concurrently and managing replication lag in the slave DBs.

Data Processing

The team continued monitoring analytics systems and responding to issues when [non-critical] alarms in went off. Packet losses and kafka issues were diagnosed and handled.

Hadoop worker nodes now automatically set memory limits according to what is available. Previously all workers had the same fixed limit. This allows for better resource utilization.

Logstash is now available at https://logstash.wikimedia.org (Wikitech account required). Logs from Hadoop are piped there for easier search and diagnosis of Hadoop jobs.

Some uses of udp2log were migrated to kafkatee. The latter is not prone to packet losses. In particular Webstatscollector was switched over and error rates were seen to drop drastically. Eventually, the “collecting” part of Webstatscollector will be implemented in Hadoop, a much more scalable environment to handle such work.

Editor Engagement Vital Signs

The team implemented the stack necessary to load EEVS in a browser and has a rough implementation of the UI according to Pau’s design . The team also made available to EEVS two metrics already implemented on Wikimetrics: number of pages created, and number of edits.

Research and Data

This month we hosted the WikiResearch hackathon, a dedicated research track of the Wikimania hackathon. 3 demos of research code libraries were broadcast during the event and several research ideas filed on Meta. Highlights from the hackathon include: Quarry (a web client to query Wikimedia’s slave databases on Labs); wpstubs (a social media bot broadcasting newly categorized stubs on the English Wikipedia); an algorithmic classification of articles due to be re-assessed from the English Wikipedia WikiProject Medicine’s stubs.

We gave or participated in 8 presentations during the main conference.

We published a report on mobile trends expanding the data presented at the July 2014 Monthly Metrics meeting. We started work on referral parsing from request log data to study trends in referred traffic over time.

We generated sample data of edit conflicts and worked on scripts for robust revert detection. We published traffic data for the Medicine Translation Taskforce, with a particular focus on traffic to articles related to Ebola.

We wrote up a research proposal for task recommendations in support of the Growth team’s experiments on recommender systems. We analyzed qualitative data to assess the performance of Cirrus Search “morelike” feature for identifying articles in similar topic areas. We provided support for the experimental design of a first test of task recommendations. We performed an analysis of the result of the second experiment on anonymous editor acquisition run by the Growth team.

We hosted the August 2014 research showcase with a presentation by Oliver Keyes on circadian patterns in mobile readership and a guest talk by Morten Warncke-Wang on quality assessment and task recommendations in Wikipedia.

We also gave presentations on Wikimedia research at the Oxford Internet Institute, INRIA, Wikimedia Deutschland (slides) and at the Public Library of Science (slides). Aaron Halfaker presented at OpenSym 2014 a paper he co-authored on the impact of the Article for Creation workflow on newbies (slides, fulltext).

Wikidata

The Wikidata project is funded and executed by Wikimedia Deutschland.

August was a very busy month for Wikidata. The main page was redesigned and is now much more inviting and useful. A lot of new features were finished and deployed. Among them are:

  • Redirects: allowing you to turn an item into a redirect.
  • Monolingual text datatype: allowing you to enter new kinds of data like the motto of a country.
  • Badges: allowing you to store badges for articles on Wikidata. This includes “featured article” and “good article”. More will be added soon.
  • In other projects sidebar as a beta feature: allowing you to show links to sister projects in the sidebar of any article.
  • Special:GoToLinkedPage: allowing you to go to a Wikipedia page based on its Wikidata Q-ID. This will be especially useful if you want to create links to articles that don’t change even if the article is moved.
  • Wikinews: Wikinews has been added as a supported sister project. Wikinews can now maintain their sitelinks on Wikidata. Access to the other data will follow in due time.
  • Wikidata: Sitelinks to pages on Wikidata itself can now also be stored on Wikidata. This is useful to connect for example its help pages with those on the other projects.
  • Change of the internal serialization format: The internal serialization format changed to be consistent with the serialization format that is returned by the API.
In addition, the team worked on a lot of under-the-hood changes towards the new user interface design and started the discussions around structured data support for Commons. The log of the IRC office hour is available.

Future

The engineering management team continues to update the Deployments page weekly, providing up-to-date information on the upcoming deployments to Wikimedia sites, as well as the annual goals, listing ongoing and future Wikimedia engineering efforts.

This article was written collaboratively by Wikimedia engineers and managers. See revision history and associated status pages. A wiki version is also available.

Archive notice: This is an archived post from blog.wikimedia.org, which operated under different editorial and content guidelines than Diff.

Can you help us translate this article?

In order for this article to reach as many people as possible we would like your help. Can you translate this article to get the message out?