Wikimedia blog

News from inside the Wikimedia Foundation.org

Deployments

Filter preventing abusive edits comes to all wikis

The AbuseFilter extension for MediaWiki, which helps prevent vandalism on wikis, will be globally enabled on all Wikimedia projects later today.

AbuseFilter was developed by Andrew Garrett with support from the Wikimedia Foundation; it was first enabled on the English Wikipedia in March 2009.

Since then, many local wiki communities have asked individually for AbuseFilter to be turned on on their wiki. As of July 2011, AbuseFilter was already enabled on 66 wikis, out of the 843 wikis the Wikimedia Foundation hosts.

It recently appeared it would just be simpler to enable AbuseFilter by default on all wikis, rather than doing it on request.

When enabled, AbuseFilter comes with no built-in default filters, so no immediate change will be visible on wikis where it is enabled.

Contrary to other anti-vandalism tools, AbuseFilter works by analyzing edits before they’re saved, rather than trying to identify (and revert) them after the fact.

Filters, or “rules”, can be added to AbuseFilter to identify certain kinds of edits matching a pattern. Actions can be taken for these edits, like tagging the edit, preventing the user from saving the page, or even automatically blocking the user. The AbuseFilter documentation provides the format in which filters must be written.

A screenshot of the list of AbuseFilter rules on the English Wikipedia

AbuseFilter catches abusive edits matching defined patterns.

Because AbuseFilter has been in use on the English Wikipedia for more than two years, more details about how AbuseFilter works are available in their documentation; Instructions on how to create a filter are also available.

It is possible to export filters from a wiki, and to import them into another one.

AbuseFilter is an extremely powerful tool, with the potential of preventing edits, blocking users, and making a whole wiki unusable. Therefore, it must be used with extreme caution; filters should only be created and edited by administrators who understand their purpose and syntax.

AbuseFilter can also be used to identify edits that are not abusive, for tracking purposes. Tags can be automatically added to edits matching a certain pattern, thus giving editors and patrollers a heads-up about certain edits (see examples).

Because such tags can also be used to identify legit edits, AbuseFilter is sometimes referred to as “Edit filter”.

AbuseFilter offers the possibility for certain filters to be private, to prevent long-time abusers from knowing how their edits are being identified.

We hope this tool will prove useful to our community of editors and patrollers.

Guillaume Paumier
Technical communications manager

Does your Wikipedia mobile App expect our full content layout?

If so we have an upcoming change this week that you should be aware of. We’re in the final part of our new device detection testing that will automatically redirect any mobile agent we recognize over to its corresponding .m mobile gateway.This means that if your app declares a mobile UA as recognized by WURFL and connects directly to us we will redirect that traffic to .m.wikipedia.org and NOT .wikipedia.org.
Those apps that use an intermediate gateway which don’t have a mobile user agent will not be affected. If on the other hand your app does all of your logic then you will need to explicitly identify your UA to us.  Or, ensure that your UA contains “bot” to bypass redirection.

If this is not the behavior that you want then please let us know at know on meta or come find us on freenode #wikimedia-mobile.

Tomasz Finc

Director of Mobile and Special Projects

Expanded Use of Article Feedback Tool

Today on English Wikipedia we rolled out the Article Feedback Tool – previously featured on 3,000 English Wikipedia articles – to a larger set of 100,000 articles. This initial expansion is intended to further assess both the tool’s value and its performance characteristics, with an eye to a full deployment on Wikipedia and potentially other projects.

Some examples of articles that currently feature the tool (at bottom):

The intent of the tool is two-fold:

  • to gain aggregate quality assessments of Wikimedia content by readers and editors;
  • to use as an entry vector for other forms of engagement.

To assess its value in both categories, we’ve already undertaken a significant amount of qualitative and quantitative research. You can read an extensive summary of our work so far here.

The high level summary based on the data we’ve seen so far: We believe user ratings can be a valuable way to predict high and low quality content in Wikimedia, and we’re especially interested in engaging raters beyond the initial act of assessing an article. Through our trials to date, we’ve seen very good conversion rates on the calls-to-action that follow a rating, suggesting that this could be a powerful engagement tool as well.

Beyond continuing our own research and these engagement experiments, our goal is to regularly make available anonymized data from the tool, and to supply editors with a dashboard tool for surfacing trends in the rating data. We’re looking forward to sharing wider findings from the use of the tool soon.

Please use the talk page or comment below for feedback, questions and suggestions.

Erik Moeller, Deputy Director

A new way to share pictures, sounds and video

UploadWizard uploading multiple files

On April 15, Wikimedia Commons celebrated its 10 millionth media file. A new feature will help to increase that number even faster. The upload wizard, which entered public beta in late November and has been used to upload more than 10,000 files already, is now the default upload tool on Wikimedia Commons. Use it and tell us what you think, as we continue to improve it.

Here’s what’s different:

  • Instead of overloading the user interface with information about licensing and acceptable content, there’s a single comic strip tutorial explaining the licensing policy, which can be dismissed after the first time you see it.
  • You only see complexity when you need to see it. There are sensible defaults for licensing, automatic metadata extraction from the uploaded files, etc.
  • You can upload up to 10 files as a batch, instead of having to upload each file individually. You can see thumbnails of the files you’re uploading, and abort any individual upload.
  • Error cases should be handled in a clear and understandable fashion, and guide the user towards the most sensible action (e.g. when a file needs to be renamed, the upload shouldn’t fail: instead, the tool will prompt that a rename is necessary).
  • As a final step, the UploadWizard explains how to add uploaded files to pages in Wikimedia projects.

And here’s what some of our experienced users have said during the beta:

  • “The upload wizard provides a much less cluttered and confusing upload process.”
  • “Great performance from the upload wizard. A lot of the more tiresome details are filled out automatically”
  • “Fantastic wizard makes process clearer, but please keep the old form for more experienced users. Thanks!”
  • “I never thought the old uploading process was too hard, but this new upload wizard is amazingly simple. It actually makes me want to upload more.”
  • “Much improved method of uploading files. Multiple file uploads, auto filling of dates, user name, etc, simplifies license input, all help to reduce time required to upload. Great work.”

The UploadWizard requires JavaScript (if JavaScript is disabled, you’ll get a simplified upload form instead). It’s been fully translated into Dutch, French, Galician, German, Hebrew, Indonesian, Interlingua, Macedonian, Malayalam, Portuguese, Russian, Slovenian,Tagalog, and Vietnamese (call for translations). Tell us what you think — and remember, if it doesn’t work for you, you can always go back to the old form. In the coming weeks, we’ll not only examine the impact that this new tool will have on the overall number of media uploads, but also whether it will lead to a larger percentage of deleted content (due to lower quality uploads). We will continue to improve the tool as we learn more.

Big thanks to the UploadWizard team — Neil Kandalgaonkar, Ryan Kaldari, Guillaume Paumier, Alolita Sharma — and to all code reviewers, operations engineers, translators and testers for their work on this project so far. We hope that you’ll enjoy the new upload experience. If you have images, sound files or videos with educational value that you’re willing to donate to the world, now is a good time to do it.

Erik Moeller, Deputy Director

Account Creation Improvement Project Update

As you may know from Sue’s March 2011 update, the Wikimedia Foundation has made it one of our highest priorities to improve the experience of new editors, and we thought we’d start right at the beginning: from when a potentially new editor makes an account.

The Wikimedia Foundation’s Community Department has been studying how we can more effectively invite users who create new accounts to actually start editing. Since February, the Account Creation Improvement Project (ACIP) has been experimenting with different user interface messages and landing pages in the account creation flow (see their results and testing content to-date).

We didn’t have an A/B testing infrastructure that supported this work, so while ACIP has performed the first tests sequentially, we’ve now deployed a modification to our ClickTracking extension to English Wikipedia which will allow us to run multiple tests in parallel and record the results.

You’ll notice the “Log in/create account” link on the English Wikipedia will send you to several possible randomized log in screens, recognizable by the “ACP” identifier in the address.  This is from the newly created CustomUserSignup extension. Over the next few months, we’ll be varying the look and messaging of these screens to see what kind of impact that has on new editors, and sharing our findings. Our testing framework will allow us to bucket-test small tweaks to the interface and measure the number of accounts created and edits made by users (in aggregate or on a per-session basis) who have gone through different flows.

What data we are storing

We are storing a new cookie upon visiting the “Log in/create account” page, with a lifetime of three months.  This cookie will be used to track the following information:

  • Which account creation messaging group the user was placed in (identified as ACP1, ACP2 or ACP3 for now)
  • What version of the account creation campaign they recieved
  • Whether the particular user made it to the end of the account creation process, or whether they dropped off after reaching the login screen or the account creation screen
  • If (and only if) the user creates a new account, the number of edits or previews during the course of the trial

The information is associated with browser sessions (each of which has an individual unique identifier), not with an individual user or user account.

Anyone visiting the login page or the account creation page for English Wikipedia will have this
cookie set.  This is to make sure that we always provide the same wording to a particular visitor, so as not to invalidate our test.  We will stop setting this cookie at the conclusion of this work, though we will likely perform other similar tests in the future.

Because of the privacy-sensitive nature of the system, we have a limit on the level of granularity of our findings. For example, we won’t be able to create a plot of users vs edits, because we don’t have user-level data.

We look forward to the findings of the Account Creation Improvement Project, which will ultimately help us create a better sign-up experience for all users. Independent of this project, the CustomUserSignup extension may also prove useful to other outreach projects, by making it possible to create customized sign-up forms (e.g. for student workshops or e-mail invitations).

Nimish Gautam

Article Feedback Pilot: Next Version

On March 14, we launched v2.0 of the Article Feedback Tool.  Version 2.0 is represents a continuation of the work we started last September.  To quickly recap, the tool was originally launched as a part of the Public Policy Initiative.  In November, the feature was added to about 50-60 articles on the English Wikipedia, in addition to the Public Policy articles.  The purpose of adding the tool to these additional pages was to provide us with additional data to help understand the quality of the ratings themselves, namely do these ratings represent a reasonable measurement of article quality?

Since then, we’ve been evaluating the tool using both qualitative and quantitative research.  We conducted user research on the Article Feedback tool both to see how users actually used the tool and to better understand the motivations behind rating an article.   Readers liked the interactivity of the feature, ease of use, and the ability to easily provide feedback on an article.  On the other hand, some of the labels (e.g., “neutral”) were difficult to understand.   A detailed summary of the user research has been posted here.

We also did some quantitative research on the ratings data.  Though the ratings do appear to show some correlation with changes in the content of the article, there is ample room for improvement (see discussion of GFAJ-1).  It also appears as though articles of different lengths show different ratings distributions.  For example, there appears to be a correlation between Well-Sourced and Completeness and length for articles under 50kb, but for articles over 50kb in length, the correlation becomes far weaker (see Factors Affecting Ratings).

Based in part on the results from the first version, v2.0 of this feature was designed with two main goals in mind.

  • First, we wanted to see if we could improve the correlation between ratings and change in article quality by segmenting ratings based on the rater’s knowledge of a topic.  We introduced a question which asks the user whether she is “highly knowledgeable” about the topic.  The answers to this question will enable us to compare ratings from users that self-identify as highly knowledgeable versus ones that don’t.
  • Second, we wanted to see if rating an article could lead to further participation — does rating an article provide an easy way to contribute, leading to additional participation like editing?  We wanted to test this hypothesis in light of the recent participation data.  We don’t know whether this will actually be the case, but we wanted to get some data.  In v2.0, there is a mechanism that shows a user a message (e.g., “Did you know you can edit this article?”) after they submit a rating.  We will measure how well these messages perform.  (These messages are dismissible by clicking a “Maybe later” link).

We also made some UI changes based on the feedback from the user study.  For example, “Neutral” was changed to “Objective” (as were some other labels) and the submit button has been made more visually obvious.  There are a number of other improvements which may be found on the design page.

Finally, in an effort to get a wider variety of articles to research, we increased the number of articles with the tool.  We knew from our early analysis that articles in different length bands received different rating distributions, so we created length buckets (e.g., 25-50kb) and selected a random set of articles within each length bucket.  User: Kaldari wrote a bot which takes the list of articles and places the tool on the articles in the list [10].  As of March 24, there are approximately 3000 articles that the tool is currently active on.  We may expand this list if we can do so without impacting performance of the site.

We’ll be publishing analysis on v2.0 in the coming weeks.  In the meantime, please let us know what you think on the workgroup page.  Or better yet, join the workgroup to help develop this feature!

UploadWizard nearing 1.0, preview available for testing

I’m happy to announce that we’re getting close to a 1.0 release for UploadWizard, and we’re planning to deploy it to Wikimedia Commons by the end of this month.

UploadWizard is a step-by-step, multi-file uploader extension for MediaWiki that was developed as part of the Multimedia Usability Project. We launched a beta version in November 2010, and have been working on getting it to release quality ever since.

Recently, Ryan Kaldari joined the team, and he and I have been squashing bugs, testing functionality and readying the software for deployment. We’ve focused on achieving a pleasant interface, that works on all browsers, that orients users to Commons’ mission and helps them make good contributions.

You’re invited to try the new version (you’ll need an account on the prototype) and report issues you encounter with it.

By the way, some people find the UploadWizard’s design a bit surprising — you can upload files before you set a license or describe them, which sounds a bit dangerous (but not the way we’ve done it). We explain all that and more in the FAQ.

If you find a bug, you might want to check the list of open issues first. The following bugs are expected to be completed before launch: 24692, 24696, 24703, 24758, 26053, 26063, 26076, 26179, 26182, 26591, 26592, and 28046. If your problem hasn’t been reported yet, please enter the issue directly in our tracker, or leave a note on the feedback page.

In the meantime, we will be periodically updating Upload Wizard on the prototype server, fixing any (more) bugs you find as fast as we can.

And what else is left to do? Well, after this is deployed, we’re going to be watching things very closely to see how this affects Commons. Our goal is to increase the number of contributions, and the pool of contributors — without any downgrade in quality or burdening the community with spam. We have some plans about how to determine that, but we could always use more help there. If you have ideas about it, please let us know!

Thanks in advance!

Neil Kandalgaonkar
Software Engineer, Multimedia Projects
Wikimedia Foundation

Site fixes this week

We’re still in the middle of cleaning up some lingering issues from the 1.17 deployment, and despite our best efforts, you may see a little bit of quirkiness in the site:
  • One problem with the site since the deployment was a problem with our job queue, which meant that emails that were supposed to be sent from the site weren’t.  This backlog was removed last night, and a lot of pent-up email was sent.
  • There were some HTML cache invalidations that caused parts of the site to get overloaded for a few minutes.
  • Yesterday, we started the deployment of the category sorting improvements.  We deployed some modifications to the database today.  This resulted in a few hiccups on the site that we’ve since mostly recovered from.
Category collation

One key set of improvements in the MediaWiki 1.17 release is the category sorting work spearheaded by Aryeh Gregor. This code will eventually improve the sorting of categories in different languages, allowing us to choose the most appropriate sort order for the language. For now, we’re at least switching over to a more sensible sorting algorithm (Unicode Collation Algorithm (UCA)), and have made other improvements to sorting.

This set of changes required a modification of the database that we didn’t believe was risky, but was irreversible. Given how complicated the initial 1.17 deployment was, we decided to hold back on deploying this work.

There are still some maintenance scripts left to run before this work is fully-deployed, but most parts of this are done.

Other fixes
We’re also aware of and working on other problems with the job queue. We’re investigating these problems and hope to have these fixed soon.

Main deployment of MediaWiki 1.17 to Wikimedia sites complete

We have been running MediaWiki 1.17 on all Wikimedia wikis for almost a day now, and things seem to be in pretty good shape.  We still have a lot of issues to fix, including a problem with disabling the enhanced toolbar in prefs and some issues with categories (see below).  Many of the problems are around Javascript and replacing code that isn’t compatible with ResourceLoader. We have a migration guide for developers of gadgets and other MediaWiki customizations, which we encourage anyone who is having problems with gadgets to refer to.  Our developers are continuing to find and fix problems.

Based on early reports (albeit very subjective) ResourceLoader is already paying dividends, as navigating around the site seems much zippier in many cases.  We hope this is your experience as well.

We still have some deployment work left to do around this release.  In addition to the bugfixes, we also want to reintroduce the category improvements that Aryeh Gregor made last summer.  We had to temporarily remove these because they required schema changes that would make it difficult to do the type of deployment that we did.  Now that we’re confident we’re staying with MediaWiki 1.17, we should be able to deploy these improvements soon.  Some bugs with categories you see now may actually be related to this plan, so the good news is that those problems may be fixed by this coming update.  We also plan to update ArticleFeedback now that we’re on the newer codebase, and we’ll probably also update some other extensions, too.

If you are interested in the deployment, there’s much more below…

Another 1.17 maintenance window

Continuing with the work started last week, we plan to deploy 1.17 to more wikis in a couple hours (Wednesday, February 16 at 6:00 UTC for 6 hours).  We had hoped we would be able to figure out the performance issues in the past week, but unfortunately, the only practical way we have to see the load problems we witnessed last week is to put the software into production.  We have put a lot of instrumentation in place to help us diagnose our load issues.  We plan to start the upcoming deployment by rolling out to nl.wikipedia.org, and do some debugging (rolling back if necessary).  If we’re able to diagnose and fix the problems quickly, we then plan to roll out 1.17 more widely.  If we’re still stumped, we may still roll out to a few more low-traffic wikis, but leave the high-traffic sites until we figure this out.

We plan to have more updates and detailed information on the deployment page on mediawiki.org.  Thanks for your patience!

Update (2011-02-16 6:45 UTC): We’ve started the deploy, and it’s going better than we hoped.  We’ve deployed to several wikis now, including nl.wikipedia.org, de.wikipedia.org, fr.wikipedia.org, and ja.wikipedia.org.  More detailed updates will happen on the deployment page on mediawiki.org.

Update  (2011-02-16 12:39 UTC) – We have now pushed 1.17 to all wikis, and the deployment window is closed. Please see deployment page on mediawiki.org for the best way to report any problems you might encounter.