Diederik van Liere

  1. What are readers looking for? Wikipedia search data now available

    (Update 9/20 17:40 PDT)  It appeared that a small percentage of queries contained information unintentionally inserted by users. For example, some users may have pasted unintended information from their clipboards into the search box, causing the information to be displayed in the datasets. This prompted us to withdraw the files. We are looking into the feasibility of publishing search logs at an... Read more

  2. Improving the accuracy of the active editors metric

    We are making a change to our active editor metric to increase accuracy, by eliminating double-counting and including Wikimedia Commons in the total number of active editors. The active editors metric is a core metric for both the Wikimedia Foundation and the Wikimedia communities and is used to measure the overall health of the different communities. The total number of active editors is defined ... Read more

  3. Meet the Analytics Team

    Over the past few months, the Wikimedia Foundation has been gearing up a variety of new initiatives, and measuring success has been on our minds. It should come as no surprise that we’ve been building an Analytics Team at the same time. We are excited to finally introduce ourselves and talk about our plans. The team is currently a pair of awesome engineers, David Schoonover and Andrew Otto, ... Read more

  4. Do It Yourself Analytics with Wikipedia

    As you probably know, we publish on a regular basis backups of the different Wikimedia projects, containing their complete editing history. As time progresses, these backups grow larger and larger and become increasingly harder to analyze. To help the community, researchers and other interested people, we have developed a number of analytic tools to assist you in analyzing these large datasets. To... Read more

  5. Announcing the WikiChallenge Winners

    Over the past couple of months, the Wikimedia Foundation, Kaggle and ICDM organized a data competition. We asked data scientists around the world to use Wikipedia editor data and develop an algorithm that predicts the number of future edits, and in particular predicts correctly who will stop editing and who will continue to edit. The response has been great! We had 96 teams compete, comprising in ... Read more