Wikimedia blog

News from the Wikimedia Foundation and about the Wikimedia movement

Improving the accuracy of the active editors metric

We are making a change to our active editor metric to increase accuracy, by eliminating double-counting and including Wikimedia Commons in the total number of active editors. The active editors metric is a core metric for both the Wikimedia Foundation and the Wikimedia communities and is used to measure the overall health of the different communities. The total number of active editors is defined as:

the number of editors with the same registered username across different Wikimedia projects who made at least 5 edits in countable namespaces in a given month and are not registered as a bot user.

This is a conservative definition, but helps us to assess the size of the core community of contributors who update, add to and maintain Wikimedia’s projects.

The de-duplication consists of two changes:

  1. The total active editor count now includes Wikimedia Commons (increasing the count).
  2. Editors with the same username on different projects are counted as a single editor (decreasing the count).

The net result of these two changes is a decrease of the number of total active editors averaging 4.4% over last 3 years.

De-duplication of the active editor count only affects our total number of active editors across the different Wikimedia projects, the counts within a single project are unaffected. We’ve also begun work on a data glossary as a canonical reference point for all key metrics used by the Wikimedia Foundation.

Background


The total number of active editors across all our projects gives us a good sense of the vibrancy and health of our overall community. As can be seen above, while the total number of editors shows a slight decline, it is relatively stable; there’s a more pronounced decline in mature projects like the English Wikipedia, which is somewhat offset by other nascent language communities.

The total number of active editors has historically been calculated simply by summing up the numbers from individual projects, like the various language versions of Wikipedia and our other projects. This means that an editor who is active in more than one language or project is double-counted.

To ensure that this double-counting doesn’t distort the numbers too much, in August 2010, we began excluding Wikimedia Commons from the total count. Wikimedia Commons is the media repository (photos, sounds, video, illustrations) used by all Wikimedia projects, and users typically contribute to it as part of their editing activity on another project.

This has begun to change with projects like Wiki Loves Monuments, which drive thousands of new contributors to Wikimedia Commons who may have never contributed to other Wikimedia projects.

In order to increase the accuracy of our total active editor estimate, our data analyst Erik Zachte has revised his code to treat all users with the same username as the same person (which is a sufficiently robust assumption for this purpose). Accordingly, we have revised the metric of “total active editors” to also include Commons, since double-counting should no longer occur.

The result is an overall decrease of the measured number of total active editors by approximately 4.4% on the average for any given month during the past three years (see comparison of new and old numbers below).

ActiveEditorsOldAndNewTotals

The new numbers are now published in the following locations:

In addition, the project-level summary statistics for active editors (columns >5 and >100) are now also de-duplicated (e.g. active editors on Wikipedia).

If you find a metric missing in our data glossary or you have any other data / analytics related question, then please let us know! Or consider joining the Analytics mailing list or #wikimedia-analytics on Freenode (IRC). And of course you’re also very welcome to send me email directly.

 

Best regards,

Diederik van Liere, Product Manager Analytics

2012-08-31: Edited to fix broken anchor link

6 Responses to “Improving the accuracy of the active editors metric”

  1. NaBUru38 says:

    This updated measure seems very reasonable.

  2. Erik Zachte says:

    Harry, your are correct. Five years ago single User Login [1] came into effect. From that moment new users could not chose any name already existing on any wiki. And most existing accounts were unified via a multi-step procedure that resolved name conflicts. Some users may have chosen to skip this process, despite its clear benefits. Then again users that left the project before mid 2008 missed this opportunity anyway to collect their contribution trail under an unified account. So this is an approximation indeed, but good enough it seems given the above. Further assessment (via central authorisation database) is not exactly trivial, and will not resolve all ambiguities anyway, particularly for our earlier contribution history.

    [1] http://en.wikipedia.org/wiki/Wikipedia:Wikipedia_Signpost/2008-05-26/Single_User_Login

  3. Erik Zachte says:

    Andre, you raise the $64,000 question. This has been noticed before and discussed time and again. As for the suddenness of the change I have no clue. In general it is very tricky to try to explain complex social phenomena, where many factors play a role. Two aspects I would suggest to take into the equation: saturation of awareness among internet users [1], and evening out of expectations after novelty effect wears out [2]. Other factors are important to assess how well Wikipedia did in recent years, in a world where many people sign up to internet every year, but more and more social sites compete for attention, but that is beyond the issue raised here.

    [1] http://en.wikipedia.org/wiki/Market_saturation
    [2] http://en.wikipedia.org/wiki/Hype_cycle

  4. Harry Burt says:

    Never mind, apparently I can’t read since that information is clearly given in the post.

  5. Harry Burt says:

    Just to confirm, if editor A and editor B share a username, this is a sufficient condition for them to be assumed to be the same?

  6. Andre Engels says:

    What I find remarkable about this graph is the suddenness with which one behaviour changed into the other. Until the first months of 2007 the number of editors grows rapidly, possibly exponentially. After that, there is a slow decrease with a seasonal effect (more in the Northern hemisphere Winter than in its Summer). There is no period of decreasing growth between them (the growth does seem to be a bit slower in the second half of 2007 than before, but the change from growth to no growth is still a very sudden one). To me this does seem to indicate that there is some fundamental change that is at work here – if the change was caused by some factor like ‘running out of subjects to write about’, I would expect a more gradual change.