Wikimedia engineering August 2011 report

Translate This Post

Major news in August include:

  • Technical discussions at Wikimania and Developer Days;
  • Progress on HTTPS, and generally better processes in Operations;
  • The kick-off of the Internationalization and localization tools project;
  • New features in UploadWizard, including major work on customized campaigns for the Wiki Loves Monuments event;
  • MediaWiki 1.18 and the new Mobile platform approaching deployment readiness;

Hover your mouse over the green question marks ([?]) to see the description of a particular project.

Events

Recent events

  • Wikimania (August 2-7, Haifa, Israel) — A delegation of about a dozen Wikimedia engineers attended the Wikimania conference in early August, as well as the Developer Days and OpenZIM Developers Meeting. A special focus of the dev days was the recruitment and mentoring of new MediaWiki developers. Extensive notes were taken, with the hope of turning them into perennial on-wiki documentation.
  • Google Tech Talk (August 25, Mountain View, California) — Erik Möller, Rob Lanphier and Alolita Sharma presented a tech talk at Google to give an all-round update across Wikimedia’s engineering projects, to help refresh the understanding of Googlers and other interested parties. The presentation slides are available, and the video will be made available through the GoogleTechTalks Youtube channel.

Upcoming events

  • New Orleans hackathon (14-16 October, New Orleans) — Ryan Lane and Sumana Harihareswara are organizing a coding event on the theme “The infrastructure of innovation”. The hackathon’s goal is to advance Wikimedia’s tools and infrastructure; a major focus will be Wikimedia Labs, starting with the dev-ops virtualization cluster. Other areas of work include gadgets/extensions/tools support, authorization/authentication strategy, and general training and hacking.
  • Check out the Software deployments page on the wikitech wiki for up-to-date information on the upcoming deployments to Wikimedia sites.

Personnel

Job openings

Are you looking to work for Wikimedia? We have a lot of hiring coming up, and we really love talking to active community members about these roles.
The following positions opened in August:

New and open Requests for Proposals:

The following positions are still open: Product Manager (Analytics), QA Lead, Operations Engineer (Networking), Director of Features Engineering, Systems Engineer (Data Analytics), Software Developer (Rich Text Editing, Features), Software Developer (Front-end) and Software Developer (Back-end), Product Manager (Mobile), Software Developer (Mobile).

New hires

Operations

Training and Process Improvement

  • Operations staff meeting — The Operations team got together the week of the 22nd. The goals of the meeting were to: improve and share site recovery knowledge (documentation and training); share knowledge of new project designs; review & prioritize operations projects; document and communicate our EQIAD data center buildup milestones; and to develop the RT management process.

Site infrastructure

  • Tampa Data Center [?]Mark Bergsma put into production the second router, which means we have now router redundancy in our Tampa network infrastructure. Follow-up work is underway to fully implement auto-router (hot) failover, which should be completed by mid September. Mark also standardized our LVS implementation and puppetized the configuration. Since our data center contractor in Tampa left us, the installation of new application servers was delayed. Other highlights in August include software upgrade to our Squid servers, and upload performance issues (now solved).
  • Virginia Data Center [?] — Asher Feldman deployed the new Mobile Varnish servers in our Eqiad data center. All six LVS servers are ready and two of them are in production now, load-balancing the mobile Varnish servers. Ben Hartshorne and Asher also created two new database servers for the Summer of Research interns, in addition to the original one created last month. Last, the team updated the backup procedures documentation to reflect Eqiad being our (current) key backup and recovery store.
  • HTTPSRoan Kattouw and Ryan Lane have been fixing issues that surfaced during the internal testing period. New servers were ordered for the AMS and TPA to handle SSL termination processing. In the meantime, Ryan has been setting up SSL servers in eqiad, enhancing Varnish to deal with X-Forwarded-For and X-Forwarded-Proto HTTP headers, and making necessary changes to Squid. On August 31st, HTTPS was enabled on wikimediafoundation.org and Wikimedia Commons.

Testing environment

  • Virtualization test cluster [?] — Work has resumed on this project. The puppet configuration used in the production sites has been split into public and private repositories, and all sensitive information has been moved to the private repository. Gerrit has been configured, and the public puppet configuration will soon be moved into a public repository there. Labs LDAP and SVN LDAP are currently being merged, so that SVN users will more easily have access to Labs.

Backups and data archives

  • Data Dumps [?] — August runs for English Wikipedia started on the 3rd of the month and completed on the 13th, after a couple of restarts of the history phase of the dumps. We also cleared out the backlog of earlier incomplete dumps for May, bringing us up to current. In the meantime, we did yet more tweaking of the filesystem to try to reduce file corruption and truncation issues. And finally, code to “checkpoint” the history dumps by writing a sequence of smaller files was announced and tested, and it will be used for the next production run in September.

Features Engineering

Editing tools

Participation and editor retention

  • Article feedback [?] — Development was completed in July; August was mostly devoted to data analysis. Dario Taraborelli analyzed the volume of edits, and couldn’t yet find any statistically significant difference in edits before and after the activation of the feature on English Wikipedia articles. In order to clarify licensing and privacy policies regarding the data (and to facilitate its reuse by external tools & researchers), an explanation was published, stating that user feedback data (from Article feedback and MoodBar, for example) were considered public contributions just like any edit.
  • WikiLove [?] — Development was completed in July, with minor fixes in August. Dario Daraborelli analyzed data from the usage of the tool on the English Wikipedia, which showed that WikiLove messages were disproportionately sent by new editors. Data specifications were published in preparation for the release of data dumps; Summer of Research fellows also worked on an algorithm to automatically categorize WikiLove comments.
  • Feedback Dashboard [?]Brandon Harris started to design a dashboard to surface and sort data from MoodBar comments. It could become a help center where experienced users can easily answer questions and concerns from new users.
  • GlobalProfile [?]Brandon Harris presented this project at Wikimania, where it was well received and echoed other talks encouraging better tools for social interactions between users.
  • QuickComments [?]Brandon Harris started to design this feature proposal after it appeared that many new users were using the WikiLove tool to send messages to other users, because they couldn’t find any other way. Initial designs add a new icon at the top of user pages, which opens a modal overlay to leave a new message.

Multimedia Tools

  • UploadWizard [?]Ian Baker worked on the TitleBlacklist API, as well as bug fixes for UploadStash. Together with Neil Kandalgaonkar, he investigated video thumbnail issues. Ian also released Neil’s message string library (that leverages wikitext, jQuery, and internationalization tools) after packaging it into a MediaWiki extension. Contractor Jan Gerber added XHR FormData support to UploadWizard, and chunk uploads. Jeroen De Dauw‘s code to support customized campaigns was deployed to the Commons prototype wiki, then to production on Commons; Jeroen also added new features based on the feedback from Wiki Loves Monuments organizers. Neil reviewed Jeroen’s and Jan’s code, and generally prepared the code for deployment.

MediaWiki infrastructure

  • ResourceLoader [?]Roan Kattouw and Timo Tijhof got together in late August to do back-end work on Global gadgets. They improved the format for defining gadgets, which will eventually be done via a user interface. Gadget internationalization is now also fully supported and happening in a MediaWiki: page for each message, as opposed to being a large blob in the gadget source.

Wikimedia Labs

  • TimedMediaHandler [?]Michael Dale completed the fixes suggested in code review, and continued to prepare the extension for deployment. Jan Gerber fixed an ffmpeg seek issue and cleaned up transcode key names.

Mobile

  • Mobile Research [?]Mani Pande and Parul Vora continued to synthesize the findings from field research in India and Brazil; a research page was also created on meta. They launched user experience research in US with AnswerLab, and have started recruiting readers and editors for ethnographic research to be conducted in San Francisco, Dallas and Chicago. The mobile survey was prepared in LimeSurvey, and translations are ongoing.

Special projects

Fundraising support

Offline

  • Wikipedia version tools [?]Yuvaraj Pandian successfully completed his Google Summer of Code project. He achieved feature parity with User:CBM‘s WP 1.0 bot and added the ability to save selections of articles, manually modify/delete the contents of a selection, and export a CSV of article selections. Yuvaraj reached out to the community to review his code, and plans to continue work on the extension, which still requires extensive testing and bug fixing.

Platform Engineering

An overview of the Platform engineering team was published on the Wikimedia blog in August.

MediaWiki Core

Wikimedia analytics

  • Wikimedia Report Card 2.0 [?]Nimish Gautam and Sam Reed worked on allowing content from CSV files and from Google Spreadsheets into the dashboard. Nimish also mined data to identify editors by geography, and worked on a page views tab, using the WURFL library to estimate mobile page views and device capabilities.

Technical Liaison; Developer Relations

  • Bug management [?]Mark Hershberger held bug triage sessions on Mobile & PDF export/Collections. The bug triage page now lists past and upcoming triages, as well as notes and summaries when available.
  • Summer of Code 2011 [?] — In August, the GSoC students finished their projects and students and mentors turned in their final evaluations; all seven remaining students passed. They started to write to the wikitech-l mailing list to summarize what they finished and what still needs to be done. For example, Salvatore Ingala wrote an integration howto to guide other MediaWiki developers in merging his code into trunk. Students are expected to upload representative tarballs of their code into the Google Code portfolio repository.
  • Engineering project documentation [?]Guillaume Paumier continued to update project documentation pages and to write engineering reports.
  • Volunteer coordination and outreach [?]Sumana Harihareswara has been following up on contacts made at OSCON and Wikimania conferences. She has publicized the NOLA Hackathon and encouraged extension, gadget, script, tool, and template developers to attend. Additionally, she has been publicizing the work of the parser and visual editor team, encouraging code reviewers, and finding administrators and developers of other intensive MediaWiki installations to bring them into the larger MediaWiki ecology. In August, 9 developers were granted commit access: 6 volunteers and 3 Wikimedia Foundation employees.
  • MediaWiki architecture document [?] — Greg Wilson, editor of the Architecture of Open Source Applications book, contacted the engineering department of the Wikimedia Foundation to offer to include a chapter on MediaWiki in volume 2 of the book, which presents the architecture of large-scale open-source projects, and decisions that led to it. Since it appeared that a document would also be generally useful to help new developers dive into MediaWiki development, Guillaume Paumier and Sumana Harihareswara accepted the responsibility of leading the collaborative writing of the document by the MediaWiki community.

This article was written collaboratively by Wikimedia engineers and managers. See full revision history. A wiki version is also available.

Archive notice: This is an archived post from blog.wikimedia.org, which operated under different editorial and content guidelines than Diff.

Can you help us translate this article?

In order for this article to reach as many people as possible we would like your help. Can you translate this article to get the message out?

5 Comments
Inline Feedbacks
View all comments

Thanks for this monthly report. I really like to know what you tech guys are up to.
What bothers me is that I don’t know what are your priorities by reading this report. I assumed that a Visual Editor for our projects was top priority, but there are only five lines on that subject in the whole report! Is this project “on tracks”? Will the WYSIWYG system be available next year? or when?
Thxs in advance for your answer.
(sorry if I made some errors, I am not fluent in English)
–[[User:Edhral]]

Hi Edhral, The goal of the monthly report is to provide a brief status update on every engineering activity sponsored by the Wikimedia Foundation. Priorities have been outlined in the Strategic product whitepaper, where you can see that the Visual editor is indeed one of the main priorities (called “Great Movement Projects”). More details on a specific project are generally available on the project’s dedicated page, linked to from the report. If someone is looking for a piece of information that isn’t there (e.g. a timeline, and indication of progress), they should always feel free to e-mail the team (listed… Read more »

Great report, as always 🙂 I really enjoy reading these, but was surprised this month to see Liquid Threads does not appear at all. In previous months it at least says “on hold in favour of developing Mood Bar” but now it appears to have been dropped completely. Is this the case?

What is AMS and TPA in “AMS and TPA to handle SSL termination processing”?

Sebastian: AMS is the Amsterdam data center, and TPA is the Tampa one.