Wikimedia blog

News from inside the Wikimedia Foundation.org

Posts by Guillaume Paumier

Wikimedia engineering September 2011 report

Major news in September include:

(more…)

Wikimedia engineering August 2011 report

Major news in August include:

  • Technical discussions at Wikimania and Developer Days;
  • Progress on HTTPS, and generally better processes in Operations;
  • The kick-off of the Internationalization and localization tools project;
  • New features in UploadWizard, including major work on customized campaigns for the Wiki Loves Monuments event;
  • MediaWiki 1.18 and the new Mobile platform approaching deployment readiness;

(more…)

Filter preventing abusive edits comes to all wikis

The AbuseFilter extension for MediaWiki, which helps prevent vandalism on wikis, will be globally enabled on all Wikimedia projects later today.

AbuseFilter was developed by Andrew Garrett with support from the Wikimedia Foundation; it was first enabled on the English Wikipedia in March 2009.

Since then, many local wiki communities have asked individually for AbuseFilter to be turned on on their wiki. As of July 2011, AbuseFilter was already enabled on 66 wikis, out of the 843 wikis the Wikimedia Foundation hosts.

It recently appeared it would just be simpler to enable AbuseFilter by default on all wikis, rather than doing it on request.

When enabled, AbuseFilter comes with no built-in default filters, so no immediate change will be visible on wikis where it is enabled.

Contrary to other anti-vandalism tools, AbuseFilter works by analyzing edits before they’re saved, rather than trying to identify (and revert) them after the fact.

Filters, or “rules”, can be added to AbuseFilter to identify certain kinds of edits matching a pattern. Actions can be taken for these edits, like tagging the edit, preventing the user from saving the page, or even automatically blocking the user. The AbuseFilter documentation provides the format in which filters must be written.

A screenshot of the list of AbuseFilter rules on the English Wikipedia

AbuseFilter catches abusive edits matching defined patterns.

Because AbuseFilter has been in use on the English Wikipedia for more than two years, more details about how AbuseFilter works are available in their documentation; Instructions on how to create a filter are also available.

It is possible to export filters from a wiki, and to import them into another one.

AbuseFilter is an extremely powerful tool, with the potential of preventing edits, blocking users, and making a whole wiki unusable. Therefore, it must be used with extreme caution; filters should only be created and edited by administrators who understand their purpose and syntax.

AbuseFilter can also be used to identify edits that are not abusive, for tracking purposes. Tags can be automatically added to edits matching a certain pattern, thus giving editors and patrollers a heads-up about certain edits (see examples).

Because such tags can also be used to identify legit edits, AbuseFilter is sometimes referred to as “Edit filter”.

AbuseFilter offers the possibility for certain filters to be private, to prevent long-time abusers from knowing how their edits are being identified.

We hope this tool will prove useful to our community of editors and patrollers.

Guillaume Paumier
Technical communications manager

Wikimedia engineering July 2011 report

Major news in July include:

  • Ongoing data replication from our primary Florida data center to our new Virginia data center;
  • The deployment of the Article Feedback feature to all articles on the English Wikipedia, and the deployment of MoodBar;
  • The successful implementation of a MySQL-based parser cache on Wikimedia wikis;
  • Mid-term evaluation of our Summer of Code projects.

(more…)

Wikimedia engineering June 2011 report

Major news this month include:

  • the network setup in our new datacenter, that opened the way to new server setup and backups;
  • progress on features to encourage and facilitate participation, like the Visual editor groundwork, and the WikiLove button;
  • productive community testing on our now mobile front-end and the Kiwix download manager;
  • the release of MediaWiki 1.17.0;
  • the first commits by our Summer of Code students;
  • major progress on our code review backlog.

Note: This month, we’re trying out a slightly modified format for the report. Hover your mouse over the green question marks ([?]) to see a description of a particular project.
(more…)

Wikimedia engineering May 2011 report

Major news this month include:

  • the Berlin Hackathon, where about 70 developers and engineers met to improve our technical infrastructure;
  • the deployment of the Upload Wizard as default uploader on Wikimedia Commons;
  • the continued development, deployment and roll-out of the Article feedback tool on the English Wikipedia;
  • major progress in reducing our code review backlog. (more…)

New interactive visualization shows global distribution of Wikipedia edits

Wikimedia Data Analyst Erik Zachte recently unveiled a new interactive visualization showing the global distribution of edits for various language editions of Wikipedia.

The animation shows a global map of edits made on May 10, 2011.

The animation shows a global map of edits made on May 10, 2011.

This first version allows users to see where edits are coming from for a given day. Right now, the day is fixed but fairly recent.

You can control the parameters of this interactive visualization by using keyboard shortcuts available in a “Help” menu (press ‘H’). For example, ‘E’ switches between different event markers.

Hit 'M' to switch to a black background, and 'E' to switch between different styles of event markers. Here, language codes are shown instead of bubbles.

Hit 'M' to switch to a black background, and 'E' to switch between different styles of event markers. Here, language codes are shown instead of bubbles.

The data behind these graphics comes from our Squid logs, that usually record about 400,000 edits a day. See Erik’s post to read more about how the visualizations were made.

By zooming on a particular area (‘+’ or mouse scroll), or filtering the edits by language (‘N’ or space bar), interesting things can surface. For example, bubble maps and heat maps reflect densely populated areas with easy Internet access.

Hit 'N' or the Space bar to display a specific language. Here, edits to the English Wikipedia are shown on a bubble map ('2').

Hit 'N' or the Space bar to display a specific language. Here, edits to the English Wikipedia are shown on a bubble map ('2').

Three types of displays are available, all showing the spatial distribution of edits over time in a different way: an accelerated animation of edits over a day (’1′), a bubble map of the same edits over a day (’2′), and a heat map of edits over a day (’3′).

The animation over the course of the day also shows the levels of activity depending on the time it is in various timezones. Compare for example the activity of the Spanish Wikipedia in Spain and Latin America over the course of the day.

Hit '3' to switch to the heat map of edits combined on a single day. This heat map shows edits to the Spanish Wikipedia, mostly distributed in Spain and Latin America.

Hit '3' to switch to the heat map of edits combined on a single day. This heat map shows edits to the Spanish Wikipedia, mostly distributed in Spain and Latin America.

Similarly, the map shows that most edits to the Chinese Wikipedia are made from outside of mainland China (Hong Kong and Taiwan):

Zoom in using the + key, or the mouse scroll.

Zoom in using the + key, or the mouse scroll.

Open the visualization and play with it yourself!

In the tradition of free software that Wikimedia is attached to, this visualization was entirely created using HTML5 (canvas) and JavaScript, and no proprietary tool is necessary to view the animation.

The visualization works in the most recent browsers. If for some reason it doesn’t work for you, below is a short video to give you an overview of what it looks like when animated.

The video is also available on Wikimedia Commons, along with more screenshots.


Guillaume Paumier

Developers go home after productive Berlin hackathon

These people make Wikipedia and MediaWiki awesome.

Most MediaWiki developers who attended the Berlin hackathon this weekend have left the German capital and returned home, after three days of collaborative coding, group discussions, short presentations, and bug fixing.

A lot of work was already accomplished on Friday and Saturday, including presentations on test frameworks, coding of new features, discussions on wikitext parsers, and a usability testing session.

Things were a bit slower on Sunday, but lack of sleep didn’t stop developers from coding and smashing bugs. Brandon Harris gave a short talk about identity, editor retention and social features. Domas Mituzas talked about how to improve performance; Tim Starling followed by discussing adding HipHop support for MediaWiki, and its planned deployment to Wikimedia sites.

Mark Bergsma also gave an overview of the situation of the Wikimedia infrastructure regarding IPv6 (and our participation in IPv6 Day) and Mathias Schindler discussed WebP support. All the live notes taken yesterday are available.

The rest of the day was used to continue to code, discuss and smash bugs. Some groups explored the city before returning home. The day ended with participants hacking and socializing at the C-base.

If you couldn’t attend, the videos of all the talks are available for you to watch (or re-watch). Many pictures of the event are already on Wikimedia Commons, and more will follow. Presentation slides will be added to the hackathon page as they come in.

We hope the live video streaming, real-time note taking, and IRCing / tweeting was useful for remote attendees; please tell us what we did right and what needs improving. We’d love to get feedback on what worked for you, and what didn’t.

We’d like to thank everyone who was involved in making this event awesome, and particularly the participants, who came from all over the world to work together to improve our technical platform.

Many thanks to the team from Wikimedia Deutschland as well, who masterminded the whole event: Nicole Ebber, Daniel Kinzler, Cornelius Kibelka, and the rest of their team.

Participants agreed they were looking forward to more hackathons, in Berlin and elsewhere. We’ll see you there!


Guillaume Paumier

Photo from Wikimedia Commons by Tobias Schumann, under CC-by-sa 3.0 Germany.

Berlin hackathon continues with group coding, discussions and bug squashing

With tired eyes, and fueled by ridiculously large amounts of coffee, Wikimedia developers and engineers are now starting their third and last day of collaborative coding at the Berlin “hackathon”.

The event, organized by Wikimedia Deutschland, has been going on since Friday. About a hundred participants are enjoying our third day at coworking / hackspace Betahaus.

Yesterday, more coding happened, and even more bugs were smashed: about 65 since we started on Friday. There remains plenty to work on during this hackathon, though, if you’d like to help.

Saturday afternoon was also devoted to the discussions about the possible evolutions of the MediaWiki parser (see notes), a step towards a visual editor for Wikipedia and other MediaWiki-powered sites. (“Visual editor” seems to have reached consensus as a more social class-neutral replacement for “rich text editor”.)

Yesterday, the hackathon also hosted a usability testing session on the Kiwix offline app, led by Ryan Kaldari. The ops team is continuing its ongoing work on HTTPS & IPv6, and Victor Vasiliev partially implemented a long-awaited feature for Wikimedia wikis: a global watchlist.

The day ended with a party (with free beer and food) organized by our friends from Wikia.

You can take a look at all the live notes taken yesterday. People are also taking photos, and more will follow.

Some talks that were originally scheduled for Saturday are happening today, including Brandon Harris’ short presentation on “identity”, Mark Bergsma’s on IPv6, and the discussions on performance and HipHop, with Domas Mituzas and Tim Starling.

You can participate remotely in real time by watching the live video stream (all talks are recorded), and participating in our live note-taking in Etherpad.

You can also join us on IRC in #mwhack11 or #mediawiki on Freenode, and follow our activity using the #mwhack11 hashtag on Twitter and Identi.ca.

This year’s motto is “talk less, code more”. Happy coding!


Guillaume Paumier

Wikimedia developers start second day of Berlin hackathon

Typical traffic lights in Berlin

Green light: You can code now!

MediaWiki developers and Wikimedia engineers are starting their second day of coding, discussing and bug-smashing today in Berlin, Germany. This “hackathon”, organized by Wikimedia Deutschland, started yesterday, and will last until tomorrow Sunday.

After a short introduction yesterday, participants quickly moved on to group discussions, short presentations and coding. The event is run as an unconference, and this format has proven to be quite effective so far.

Lightning talks yesterday included presentations about the new datacenter (by Mark Bergsma), Kiwix and offline (Emmanuel Engelhart), PhotoCommons (Hay Kranen), OpenStreetMap integration (Tim Adler), WikiLove (Ryan Kaldari), PHPunit (Ashar Voultoiz), the new mobile gateway (Patrick Reilly), community-oriented testing (Ryan Lane), Narayam (Purodha Blissenbach) and distributed JavaScript testing (Timo Tijhof).

Several bugs were also fixed yesterday, but there remains quite a bit to smash during this hackathon.

A lot of group discussions (e.g. about HipHop, and the MediaWiki release plan) and actual coding happened during the afternoon and evening. You can take a look at all the notes taken yesterday in real time.

Today’s talks include discussions on “Identity” (Brandon Harris), performance, including plans to use HipHop for PHP (Domas Mituzas and Tim Starling), as well as many discussions and short talks about wikitext parsers.

To participate remotely in real time: You can still watch the live video stream (all talks are recorded), and participate in our live note-taking in etherpad.

You can also join us on IRC in #mwhack11 or #mediawiki on Freenode, and follow our activity using the #mwhack11 hashtag on Twitter and Identi.ca.

Another way to participate is by testing some of the tools people are developing. For example, Purodha Blissenbach is looking for testers for Narayam (a keyboard mapping for Indic languages), and Hay Kranen would like people to test the PhotoCommons WordPress plugin. Please contact them if you want to get involved.

This year’s motto is “talk less, code more”. Happy coding!


Guillaume Paumier