Voting wall at metrics brainstorming session, Berlin 2014.

What do metrics not tell us?

As part of the Wikimedia Conference in Berlin, on Thursday, April 10, members of the WMF Grantmaking department’s Learning and Evaluation team held a brainstorming session around metrics with chapter representatives from around the world. The aim of the session was to start a conversation around what the evaluation metrics piloted in the (beta) Evaluation Reports tell us about our current programs and what they do not tell us, in terms of program impact.

Sharing evaluation information across the movement helps program leaders all over the world benefit from each others know-how and strategies for program design. Evaluation metrics are important tools to help make decisions like, how much time and how many resources should I invest? Every program has at least one purpose or goal behind it, and having a systematic way to measure the results of those goals helps program leaders to better tell the story of their programs; what worked, what didn’t, why or why not.

During the brainstorming session, we worked in two groups, one focused on image upload based programs, the other focused on text-centered programs, to start to answer three big questions:

  • What outcomes and story do the pilot metrics bring forward?
  • Where are there gaps in the story, or what outcomes do the pilot metrics not measure?
  • How else might we measure the outcomes that are not yet included in the story?

Measuring quality: a common demand

When it comes to evaluating a program, brainstorm participants struggled together in trying to achieve a shared understanding, and measures, of quality. During the session, participants in both groups made the important distinction between measures of quality versus qualitative measures, as well as identified many important aspects of “quality” that need to be set apart. For instance, use of a single image on several Wikimedia projects is one aspect of quality. We might label this aspect “popularity” or “demand.” But aspects of quality such as encyclopedic value, composition, or rarity would need an alternative measure. In the same way, edit counts do not tell us whether those edits were to a single article, were minor or major corrections, or were significant contributions that add deeper meaning and quality to wikitext.

The importance of qualitative measures was also stressed at the brainstorm, such as capturing participants impressions of the program activities from open-ended questions or observational measures. For example, at the end of an edit-a-thon, a program leader might want to know what volunteers think editing Wikipedia is about. Besides measures using an open-ended question and response format such as through a survey or poll, the observational coding of a dialogue about Wikipedia, or other qualitative strategies are also useful.

Defining impact together

Women in Science Edit-a-thon, in France. Thematic editing marathons are becoming more and more common.

In a much broader sense, some of the core key questions, beyond quality, arose surrounding other less tangible variables, mainly:

  • Are we building a community with our actions?
  • How do you measure if a program has social impact?
  • Are new editors on the wiki projects the only variable at stake?

The purpose of creating a shared understanding of evaluation and strategies for measuring impact is a common need for effective programs. If there is time, money and dedication put into programs: Does it turn out the way we thought? Do we reach our goals? In order to continue to innovate and expand, Wikimedia program leaders and evaluators have a need for understanding the ‘what and why’ of successful programs. This insight will allow them to weigh (and sometimes reconsider) program designs and choices, having at heart to invest in programs and activities that demonstrate an ability to reach desired outcomes.

This post has more questions than statements for a reason. After the brainstorming, and using the conference session as a starting point, the Program Evaluation & Design Team has made a Request for Comments to gather community feedback around the evaluation initiative. In order to have a full picture, we would like to have an open conversation with program leaders about:

  • How we evaluate and report on the program evaluation and design capacity-building initiative.
  • The evaluation learning opportunities and resources made available.
  • The program metrics piloted in the beta reports.
  • How we should assess grantmaking to groups and individuals.

Join the conversation and help us build metrics that are meaningful to all of us. You can also participate in the metrics brainstorm session directly or via the link in the full dialogue.

María Cruz
Community Coordinator of Program Evaluation & Design