Wikimedia blog

News from inside the Wikimedia Foundation.org

Archive for October, 2011

Data analytics at Wikimedia Foundation

This post is a follow-on to my previous post “What is Platform Engineering?” .  In this post, I’ll describe the history of our analytics work, talk about how we derive and distribute our statistics, and ask you to join us in building our platform.  Summary:  we’re hiring, and we want to tell you what a great opportunity this is.

Our Data Analytics team is responsible for building out our logging and data mining infrastructure, and for making Wikimedia-related statistics useful to other parts of the Foundation and the movement.  Up until fairly recently, Erik Zachte has been the main analytics person for Wikimedia (with support from many generalists here), working first as a volunteer building stats.wikimedia.org, then on behalf of Wikimedia Foundation starting in 2008.  It started off as a large number of detailed page view and editor statistics about all Wikimedia wikis, large and small, and has since been augmented to include various summary formats and visualizations.  As the movement has grown, it has played an increasingly important role in helping guide our investments.

Tech meetup moves Wikimedia infrastructure forward

Earlier this month, about thirty MediaWiki developers and interested technologists gathered in New Orleans to learn and to work on Wikimedia’s technical infrastructure.  We made broad progress on the infrastructure of innovation at Wikimedia (notes).  Specifically:

NOLA Hackathon 16

Tim Starling and DJ Bauch driving towards greater media file storage system independence and robustness

  • We are now much closer to officially opening the doors to Wikimedia Labs and giving far more people the ability to contribute to MediaWiki without having to set up and maintain their own development environments at home.  Wikimedia Labs will provide hosted, virtualized test and development sandboxes for new and experienced programmers and systems administrators.  Many developers got beta Labs accounts, we tested at a larger scale, and we fixed several bugs.
  • Developers agreed to create a file backend abstraction layer to enable large-scale MediaWiki installations to use one of several storage systems to contain big collections of big media files.  (Wikimedia plans on using Swift, which is open source.) Microsoft’s Ben Lobaugh and SAIC’s DJ Bauch collaborated towards improving MediaWiki’s performance on Microsoft technologies as well.  Developers made architectural decisions, refactored some existing code, and improved documentation and tests for the SwiftMedia extension to MediaWiki.
  • Chad Horohoe teaching developers about unit testing

    Chad Horohoe teaching developers unit testing

    We now have a continuous integration server up and running.  This will continuously run tests checking on the latest new features and bugfixes that developers write, resulting in fewer bugs and faster development. Developers will need to write tests to reap the benefits, so Chad Horohoe taught a test-writing workshop.

  • Max Semenik finished and demonstrated the first version of his API Query Sandbox.  This allows software developers anywhere to experiment with ways to automatically get data from Wikipedia or other sites that run MediaWiki, thus enabling wider and deeper reuse of Wikimedia content.
  • Operations folks continued the Puppetization of our infrastructure: they completely reworked Varnish management in Puppet, and worked on Puppet configurations for SwiftMedia testing. This configuration management work will ensure that ops can move faster and more confidently in building and maintaining Wikimedia infrastructure. And Canonical’s Mark Mims and Kapil Thangavelu worked on improving methods for Wikimedia developers “to spin up stacks of services within the labs environment” using Juju (more details).
  • NOLA Hackathon 28

    Brion Vibber leading developers into the "glorious Git future"

    Since the engineering department is planning a switch from Subversion to Git in the next few months, Brion taught nearly everyone there how Git works (slides, audio), and how we’ll be using Git in the future. This change in our source code repository and workflow will, we hope, enable more speed and flexibility in development, both for WMF developers and community contributors.
  • We prioritized and addressed several open requests for the operations team and defect reports about the latest version of MediaWiki, 1.18, which had just been deployed across WMF sites.
  • Roan found and fixed an issue that was spouting symbolic link errors into our Apache logs, so now it’ll be easier for us to see more dangerous errors in those logs.
  • Google Summer of Code students Salvatore Ingala and Kevin Brown made progress on integrating their summers’ work into MediaWiki as used and deployed by others; Salvatore and WMF developer Roan Kattouw have a plan for getting his user scripts improvements reviewed and deployed, so they can benefit Wikimedia readers and editors.
  • A volunteer came in on Friday night knowing nothing about developing for MediaWiki, and by the end of the weekend had a working development environment on her laptop and had some ideas about how to contribute.
  • We had substantive conversations about the summer internship program and about third-party collaboration that will affect how we work in the future.

NOLA Hackathon 1

Launch Pad New Orleans, a great venue

We also ate dinner together, walked Bourbon Street, and generally got to know colleagues we’d never met before.  I expect these relationships will bear fruit for years to come.

Thanks to Ryan Lane and Dana Isokawa for organizing the event with me, and thanks to Launch Pad New Orleans for providing the venue!

Our next developers’ event is a hackathon in Mumbai November 18-20 concentrating on internationalization, localization, and mobile work.  To find out about other upcoming Wikimedia technical events, check the meetings wiki page, and follow @MediaWikiMeet on Identi.ca or Twitter.

Sumana Harihareswara
Volunteer Development Coordinator
Wikimedia Foundation

Watch how Wikipedia newbies feel about editing

Since a few months, the “Moodbar” function has invited new users on the English Wikipedia to indicate how they feel about their first editing experiences, and why. Today, the Feedback Dashboard was launched, an experimental live stream displaying these comments. Take a look to find out what makes new editors Happy, Sad, or Confused!


Follow this series of brief news on enhancing Wikipedia participation via RSS, on Tumblr, or on Wikipedia.

Wikimedia mobile grows up, offers opt-in beta features

The Wikimedia mobile project has reached a really exciting point. When we launched our mobile extension last month, we were replacing a really complicated and critical piece of the mobile infrastructure with a much simpler system. I’m happy to say that it’s worked out so well that we’re taking off the beta logo for the production Wikipedia mobile site.

What does this mean for our everyday user? It means that our mobile site has stabilized as a piece of our infrastructure, and that we’re keeping it. But also…

Since we had a lot of users who really enjoyed being part of the mobile testing community, we’ve retained the beta concept for new, pre-release features. All you have to do going forward is opt-in to the beta program.

By setting this option, you’ll have the ability to test out new features on our mobile site before anyone else. This is great if you want to help steer our mobile projects and don’t mind a little instability here and there. We’ll keep track of how many users opt in and out and individual feature usage so that we can know what’s worth keeping.

If you’re ready to continue as a tester then join the beta and send us feedback. We have two new features available for testers:

  • Search suggestions
    • Typing on mobile devices can be a pain so were showing you search results as fast as we can
    • Huge thanks to Ross Bender for doing the initial work on this
  • Interwiki links
    • Now any of our multi lingual speakers can just tap the W to switch languages
    • This was our most requested feature after launch
Search suggestions

We’re really eager to get your feedback so let us know how well the features do on your phones. You can also send us an email or tweet @WikimediaMobile with your feedback.

Come help make Wikimedia projects on Mobile better for everyone.

Tomasz Finc
Director of Mobile & Special projects

Wikipedia seeks global operator partners to enable free access

Probably the most repeated words around the Wikimedia movement are Jimmy Wales’ “Imagine a world in which every single human being can freely share in the sum of all knowledge. That’s what we’re doing.” The Wikipedia community are the ones creating that world, and the ubiquity of mobile internet is what may actually enable it. With mobile internet users expected to surpass desktop users by 2014, mobile is fast becoming the primary medium by which people around the world can access knowledge. In the Global South particularly, many new mobile internet users are part of a generation whose first and only access to the internet is on mobile. This presents both an opportunity and a challenge to Wikipedia – how do we let these users know that the sum of all knowledge exists in their pocket, and how do we make it free? On the desktop, many readers discovered Wikipedia through search, but on mobile, sessions and queries originate differently. With this in mind, we need the help of partners – namely mobile operators and handset manufacturers – to help ensure the distribution of knowledge.  This is why we’re setting out with a global mobile partnership program.

We are looking for operator partners, particularly in the Global South, to join us in this mission. We want to work with them to help promote the availability of Wikipedia on phones — and, not just on smartphones, but across the range of data and feature phone users. This would include links through bookmarks, decks, and portals as well as marketing messages driving awareness towards the accessibility of free knowledge on mobile. Additionally, we are currently exploring ways to develop feature phone access to Wikipedia through SMS and USSD, and operator partnerships will be core to that initiative as well.

At the center of this whole strategy will be the launch of Wikipedia Zero – a lightweight, text-only version of our mobile site optimized for slower connections. The “zero” part means zero-rated, or rather zero cost to the user. Operator partners would “zero-rate” the custom site, meaning the user would not get charged data fees (nor be required to have a data plan) to access it. This will be a great asset to many mobile users in the Global South, who, although they may have an internet-ready phone, are deterred by data fees. This, to us, is in pursuit of truly enabling the “free” in “freely share in the sum of all knowledge.”

We are working to enlist new global partners now, particularly for Wikipedia Zero.  Mobile partnerships have long been seen as an important priority, but we haven’t had enough manpower to execute them on a fully global scale until now.  I joined the foundation three months ago as part of the global development mobile team (lead by Kul Wadhwa) with enlisting and managing these partnerships as my priority. Kul and I have begun to talk with new partners already, and we hope to announce some soon.  Given that we have a lot of ground to cover, we have to be systematic, so we are focusing first on India and East Asia in Q4 of this year, followed by the Middle East and Africa in Q1 2012, and Latin America in Q2 2012. This coincides in part with the global development programs including India Catalyst, Arabic Catalyst, and Brazil Catalyst.  Of course, we expect there will be some deviations from this sequence.

We’re also working very tightly with the mobile dev/product teams and community to ensure all the innovations and enhancements (including the forthcoming Android release) they are bringing are accessible throughout the world through these partnerships. We look forward to sharing the progress,  learnings, and discoveries here.

Amit Kapoor
Senior Manager, Mobile Partnerships

Google drives traffic to Wikipedia, but half of readers look for Wikipedia content

Search and Wikipedia

Search is central to the Wikipedia experience – both as a way of reaching the website as well as discovering content on Wikipedia.  Several questions on the Readers Survey 2011 were aimed at understanding the search experiences across nations, languages, and devices. Here are some of the key insights about search from the study:

Contents

(more…)

Announcing the WikiChallenge Winners

Wikipedia Participation Challenge

Over the past couple of months, the Wikimedia Foundation, Kaggle and ICDM organized a data competition. We asked data scientists around the world to use Wikipedia editor data and develop an algorithm that predicts the number of future edits, and in particular predicts correctly who will stop editing and who will continue to edit.

The response has been great! We had 96 teams compete, comprising in total 193 people who jointly submitted 1029 entries. You can have a look for yourself at the leaderboard.

We are very happy to announce that the brothers Ben and Fridolin Roth (team prognoZit) developed the winning algorithm. It is elegant, fast and accurate. Using Python and Octave they developed a linear regression algorithm. They used 13 features (2 are based on reverts and 11 are based on past editing behavior) to predict future editing activity. Both the source code and the wiki description of their algorithm are available. Congratulations to Ben and Fridolin!

Second place goes to Keith Herring. Submitting only 3 entries, he developed a highly accurate model, using random forests, and utilizing a total of 206 features. His model shows that a randomly selected Wikipedia editor who has been active in the past year has approximately an 85 percent probability of being inactive (no new edits) in the next 5 months. The most informative features captured both the edit timing and volume of an editor. Asked for his reasons to enter the challenge, Keith named his fascination for datasets and that

“I have a lot of respect for what Wikipedia has done for the accessibility of information. Any small contribution I can make to that cause is in my opinion time well spent.”

We also have two Honourable Mentions for participants who only used open source software. The first Honorable Mention is for Dell Zang (team zeditor) who used a machine learning technique called gradient boosting. His model mainly uses recent past editor activity.

The second Honourable Mention is for Roopesh Ranjan and Kalpit Desai (team Aardvarks). Using Python and R, they developed a random forest model as well. Their model used 113 features, mainly based on the number of reverts and past editor activity, see the wikipage describing their model.

All the documentation and source code has been made available, the main entry page is WikiChallenge on Meta.

What the four winning models have in common is that past activity and how often an editor is reverted are the strongest predictors for future editing behavior. This confirms our intuitions, but the fact that the three winning models are quite similar in terms of what data they used is a testament to the importance of these factors.

We want to congratulate all winners, as they have showed us in a quantitative way important factors in predicting editor retention. We also hope that people will continue to investigate the training dataset and keep refining their models so we get an even better understanding of the long-term dynamics of the Wikipedia community.

We are looking forward to use the algorithms of Ben & Fridolin and Keith in a production environment and particularly to see if we can forecast the cumulative number of edits.

Finally, we want to thank the Kaggle people for helping in organizing this competition and our anonymous donor who has generously donated the prizes.

Diederik van Liere
External Consultant, Wikimedia Foundation

Howie Fung
Senior Product Manager, Wikimedia Foundation

2011-10-26: Edited to correct description of the winning algorithm

Setswana Wikipedia Challenge kicks off

The first article has been submitted in Google’s challenge to translate articles for the Setswana Wikipedia:

http://tn.wikipedia.org/wiki/Nko (“Nose”)

Around 130 participants have been trained to attend the competition in Botswana. The grand prize, sponsored by the Wikimedia Foundation, is an all-expenses paid trip to attend Wikimania 2012 in Washington D.C.!

Asaf Bartov, Head of Global South Relationships


Follow this series of brief news on enhancing Wikipedia participation via RSS, on Tumblr, or on Wikipedia.

Wikipedia goes Android and needs developers

Over the last month, we’ve been hard at work building a Wikipedia Android app for our users and partners that prefer apps over the mobile web. It’s built with the PhoneGap framework and makes use of HTML5, CSS3, and Javascript to power all of its features. Many thanks to our partners at Nitobi for getting us here.

We’re getting close to our first market release and we’d love to get more developers to hack on the code with us. For those just wanting to get involved:

Right now the code is sitting in GitHub but the plan is to move it into our own git repo, alongside MediaWiki.

How can you help?

  • Fork the code and help us with open bug requests
  • Critique the code and suggest cleanup
  • Port the app to iOS, Symbian, BlackBerry, Windows, WebOS, & Bada
    • first get it running, then customize to each platform’s look and feel
  • Hack on new features like image uploads, starting a new article, openZIM support, etc. …
  • Localization for the user interface
  • … and whatever else you can come up with!

Don’t worry if you don’t know Objective C, Java, etc. All you need to know is HTML5, CSS3, and JS. It’s really simple to write these apps.

If you’re not sure where to start, then come join our bug triage on 10/26 @ 09-10 AM PDT where we’ll be discussing easy open issues for volunteers to pick up. If you can’t make it, then send us a message at mobile-l@lists.wikimedia.org (public mailing list) or leave us a tweet @WikimediaMobile. And if you really like working on this type of development work, send us your resume as we are actively hiring.

Fork the code and join us on IRC in the #wikimedia-mobile channel on the freenode network to help us get this to the market by November.

Tomasz Finc
Director of Mobile and Special Projects

Arabic Wikipedia Convening

Yesterday was the last day of our first ever Arabic Wikipedia Convening which was which was held in Doha and kindly hosted by QCRI. For 3 days, Arabic Wikipedians, academics and technical specialists, shared their thoughts on improving the quality of articles, increasing the number of contributors and the different models of engaging Wikipedia in education.

This is probably the first time Arabic Wikipedians, who are scattered across the Middle East, get a change to meet in person. It was our pleasure meeting each of Ciphers, Abanima, Ahmad, OsamaK as well as Rami Tarawneh, who is among the early founders of Arabic Wikipedia. On the first day and after brief introductions, Rami told us the story behind how Arabic Wikipedia started; what were the challenges that faced the community during the early days and how Arabic Wikipedia policies changed along with time. For the rest of the day and for the following couple of days, the discussions revolved mainly around three main topics: Machine translations, education and outreach. We listened to the lessons learned from a machine translation project that was carried out in 2009 on Arabic Wikipedia and we had a presentation by Bala Jeyaraman, who gave us a detailed and impressive talk about a similar project that was finished last March on Tamil Wikipedia. Naren Datha, from WikiBhasha team, also gave a small talk about how their tool works. In addition to machine translation, Frank Schulenburg gave a brief introduction to how our global education program operates in different countries, then we listened to a success story by the coordinator of WikiArabi project. Our last day included discussions around possible online and offline outreach strategies that can leverage both the content and the number of contributors of Arabic Wikipedia, we were also introduced to Arabic Web Day initiative.

The discussion helped the community communicate on a personal level, and present its culture and aesthetic to enthusiasts who are considering using Wikipedia as a platform for enhancing Arabic web content, and to the QCRI team who are currently helping our Global Development department render a number of solid projects on the ground across MENA.

The global development team will leave the 80°F/27°C Doha in a couple of hours, heading to Amman for a one day visit to The University of Jordan, before we go to Egypt, for meetings with professors at Cairo University, and with the Arabic Wikipedia Community.

A year ago, Arabic Wikipedia was nearly 120k articles, with a community striving to start an action on the ground in different places, by applying a chapter model in different locations across the region. Our MENA catalyst project is now bringing new possibilities, growing a more solid vision, with feasible funding and a work-in-progress action plan.

We shall keep you posted with our next steps and research findings, meanwhile, wish us luck in our MENA endeavors, a region which is hot, in many different ways.

Salaam!
Moushira Elamrawy
Global Development Team