Wikimedia blog

News from inside the Wikimedia Foundation.org

Posts Tagged ‘open-source’

Wikimedia engineering moving from Subversion to Git

Hello, MediaWiki developers and users! You may already be aware of this: our community is embarking on a journey to leave Subversion behind and migrate to Git for our source code repositories, starting on March 3rd. This is not an easy task. Here I’ll outline our rationale for this move, as well as our planned process.

What is Git?

Git is a distributed version control system originally developed by Linus Torvalds and others to manage the Linux kernel. In the past couple of years, it has taken off as a very robust and well-supported code repository. “Distributed” means that there is no central copy of the repository. With Subversion, Wikimedia’s servers host the repository and users commit their changes to it. In contrast, with Git, once you’ve cloned the repository, you have a fully functioning copy of the source code, with all the branches and tagged releases at your disposal.

Why switch?

Three major reasons:

To encourage participation: Since Git is distributed, it allows people to contribute with a much lower barrier to entry. Anyone will be able to clone the repository and make their own changes to keep track of them. And if you’ve got an account in our code review tool (Gerrit), you’ll be able to push changes for the wider community to review.

To fix our technical process: Subversion has technical flaws that make life difficult for developers. Notably, the implementation of branching is not very easy to use, and makes it hard to use “feature branches”. Our community is very distributed, with many parallel efforts and needs to integrate many different feature efforts, so we’d like to use feature branches more. Git branches are very easy to work with and merge between, which should make things easier for our development community.  (Several other large projects, such as Drupal and PostgreSQL, have made the same switch for similar reasons, and we’ve done our best to learn from their experiences.)

Some quotes from our community:

“I love git just because it allows me to commit locally (and offline).” – Guillaume Paumier

“[Y]ou can create commits locally and push them to the server later (great for working without wifi), you can tell it ‘save my work so I can go do something else now’ in one command, and it’ll allow us to review changes before they go into “trunk” (master)…. without human intervention in merging things into trunk. Gerrit automates this process.” – Roan Kattouw

And finally, to get improvements to users faster: with better branching and a more granular code review workflow that suits our needs better, plus our ongoing improvements to our automated testing infrastructure, we won’t have to wait months before deploying already-written features and bugfixes to Wikimedia sites.

We had years of discussion before we finally decided to switch, but now we can look forward to more flexibility and power in our engineering processes.

What are we doing?

We’ve now done almost all the back-end work of preparing our repository for the move and are in the final steps of preparation (details). We’ve also written explanations of the new workflow, the migration schedule, issues yet to be addressed, and other related topics. Right now, we’re asking people to stop creating any new extensions in Subversion right now, and to watch the wikitech-l mailing list for more updates.

What are the next steps?

Over the next two and a half weeks, the Git repository that contains MediaWiki core and extensions will be brought in step with Subversion, and at first it will be read-only (no one will be able to push changes). This will allow developers to start cloning it to their local machines and getting used to things.

For MediaWiki core and for extensions that the Wikimedia Foundation deploys on its wikis, the switchover is pencilled in for the weekend of March 3rd. We’ll do core first, and then extensions after, but hopefully all in the same weekend. After the successful migration, the Subversion repository (for the directories that have moved to Git, such as /trunk/phase3/) will be made read-only.

See the full schedule.

I develop for a Wikimedia project. Do I have to switch to Git?

Only two projects are affected immediately: the core of MediaWiki and the extensions that get deployed on Wikimedia Foundation projects.

So, if you work on an extension that the Wikimedia Foundation does not use, or on a non-MediaWiki project hosted at svn.wikimedia.org, you have more time to decide. Talk it over with your community and decide whether you would like to move to Git immediately, move to Git sometime over the next several months, or move to another hosting provider sometime before mid-2013. We would like to gradually migrate all projects currently on Wikimedia’s Subversion repository so that we can make all of svn.wikimedia.org read-only by the middle of 2013, and thus only have to support one source control infrastructure.

More details.

Will training and documentation be available? When?

Yes, we will provide training and documentation to help you use the new workflow. Check our Git page and its links now, and watch that space! There will be more documentation as well as some interactive training sessions before the big switchover in early March.

If you have any questions, please ask in #mediawiki on Freenode or on wikitech-l.  Thank you!

Chad Horohoe
Git migration lead
Platform Engineering department
Wikimedia Foundation

Sumana Harihareswara
Volunteer Development Coordinator
Platform Engineering department
Wikimedia Foundation

Free software community shares lessons learned in “Open Advice” book

Open Advice book cover

The "Open Advice" book is available for free download, or purchase as print from lulu.com.

The Open Advice book, a collection of essays, stories and lessons learned by members of the Free Software community, is out!

The book was just announced at FOSDEM, the Free and Open Source Software Developers’ European Meeting, in Brussels over the week-end.

About 50 authors from many different projects of the free software community were brought together by Lydia Pintscher, the book’s editor, who started the project in early 2011.

A year and 380 pages later, the book is now available, and tries to provide an answer to the question: What’s the key thing you would have liked to know when you started contributing?

Authors answer that question for many topics, ranging from “Writing patches” to “Documentation for Novices”, to business models, conferences, translation, design, and more.

I contributed “Learn from your users”, a chapter on user experience and usability testing. You’ll also recognize other names from the Wikimedia community, like Evan Prodromou, Markus Krötzsch and Felipe Ortega.

You can learn more about the book and the authors on the book’s website.

All the content of the book is released under the same license as Wikipedia, the Creative Commons Attribution Share-Alike license.

Check it out! You can download the book for free as a PDF file, order a print from lulu.com if you prefer paper books, or fork the text on GitHub.

I hope you’ll like the book, and it’ll prove useful, whether you’re new to the world of software, or you’re a seasoned contributor already.

Guillaume Paumier
Technical Communications Manager

Announcing the WikiChallenge Winners

Wikipedia Participation Challenge

Over the past couple of months, the Wikimedia Foundation, Kaggle and ICDM organized a data competition. We asked data scientists around the world to use Wikipedia editor data and develop an algorithm that predicts the number of future edits, and in particular predicts correctly who will stop editing and who will continue to edit.

The response has been great! We had 96 teams compete, comprising in total 193 people who jointly submitted 1029 entries. You can have a look for yourself at the leaderboard.

We are very happy to announce that the brothers Ben and Fridolin Roth (team prognoZit) developed the winning algorithm. It is elegant, fast and accurate. Using Python and Octave they developed a linear regression algorithm. They used 13 features (2 are based on reverts and 11 are based on past editing behavior) to predict future editing activity. Both the source code and the wiki description of their algorithm are available. Congratulations to Ben and Fridolin!

Second place goes to Keith Herring. Submitting only 3 entries, he developed a highly accurate model, using random forests, and utilizing a total of 206 features. His model shows that a randomly selected Wikipedia editor who has been active in the past year has approximately an 85 percent probability of being inactive (no new edits) in the next 5 months. The most informative features captured both the edit timing and volume of an editor. Asked for his reasons to enter the challenge, Keith named his fascination for datasets and that

“I have a lot of respect for what Wikipedia has done for the accessibility of information. Any small contribution I can make to that cause is in my opinion time well spent.”

We also have two Honourable Mentions for participants who only used open source software. The first Honorable Mention is for Dell Zang (team zeditor) who used a machine learning technique called gradient boosting. His model mainly uses recent past editor activity.

The second Honourable Mention is for Roopesh Ranjan and Kalpit Desai (team Aardvarks). Using Python and R, they developed a random forest model as well. Their model used 113 features, mainly based on the number of reverts and past editor activity, see the wikipage describing their model.

All the documentation and source code has been made available, the main entry page is WikiChallenge on Meta.

What the four winning models have in common is that past activity and how often an editor is reverted are the strongest predictors for future editing behavior. This confirms our intuitions, but the fact that the three winning models are quite similar in terms of what data they used is a testament to the importance of these factors.

We want to congratulate all winners, as they have showed us in a quantitative way important factors in predicting editor retention. We also hope that people will continue to investigate the training dataset and keep refining their models so we get an even better understanding of the long-term dynamics of the Wikipedia community.

We are looking forward to use the algorithms of Ben & Fridolin and Keith in a production environment and particularly to see if we can forecast the cumulative number of edits.

Finally, we want to thank the Kaggle people for helping in organizing this competition and our anonymous donor who has generously donated the prizes.

Diederik van Liere
External Consultant, Wikimedia Foundation

Howie Fung
Senior Product Manager, Wikimedia Foundation

2011-10-26: Edited to correct description of the winning algorithm

Google Summer of Code students reach project milestones

Congratulations to the seven Google Summer of Code students who made it through the summer of 2011! They all accomplished a great deal, but want to continue contributing to ensure their work maximally benefits Wikimedia.

Google Summer of Code logo 2011

MediaWiki participated in Google Summer of Code 2011.

Yuvi Panda‘s assessment parsing/aggregating extension aims “to make it easier to select and export article selections for various offline collections.” Yuvi needs some code review and suggestions on how to improve it to meet the Foundation’s quality standards for deployability, as he wrote the developers’ mailing list.

Salvatore Ingala worked on making gadgets customizable. As he elaborated, that means:

  • “allowing gadgets to easily declare the list of configuration
    variables they have;
  • allowing users to easily change those settings, with an easy-to-use
    UI integrated to the Special:Preferences page.”

The next step is merging his code into trunk, which Salvatore’s planning with other MediaWiki developers.

Kevin Brown created the ArchiveLinks project to address the problem of linkrot on Wikipedia:

In articles we often cite or link to external URLs, but anything could happen to content on other sites — if they move, change, or simply vanish, the value of the citation is lost. ArchiveLinks rewrites external links in Wikipedia articles, so there is a ‘[cached]‘ link immediately afterwards which points to the web archiving service of your choice. This can even preserve the exact time that the link was added, so for sites which archive multiple versions of content (such as the Internet Archive) it will even link to a copy of the page that was made around the time the article was written.

Kevin’s next step: getting a security review of his code, getting a starter feed set up so that the Internet Archive can start archiving it, and campaigning to interest Wikimedians and thus eventually get consensus to turn it on. At least one Wikimedian has already praised Kevin for his work.

Akshay Agarwal wrote a MediaWiki extension, SignupAPI, that makes it easier for a new user to create an account. “This extension creates a special page that cleans up SpecialUserLogin from signup related stuff, adds an API for signup, adds sourcetracking for account creation & provides Ajax-ified validation for signup form.” Akshay’s waiting for code review and discussion before the project can move forward further and benefit Wikimedia users.

MediaWiki logo

Seven students contributed to various parts of MediaWiki, the wiki software that supports WMF sites.

Yuvi, Salvatore, Kevin, and Akshay all worked on features that they aim to get into Wikimedia Foundation-run wikis, such as Wikipedia, Wikisource, Wikinews, etc., sooner rather than later. In contrast, three students worked on extensions that will primarily benefit the larger MediaWiki community. For example, Yevhenii Vlasenko‘s project was a “UserStatus” feature for SocialProfile. The SocialProfile extension is not currently deployed on any WMF wikis, but will benefit several other MediaWiki administrators and users. Zhenya finished his work but would like to continue by integrating better with social networks.

And two students worked on Semantic MediaWiki, which is also not currently deployed on any Wikimedia Foundation sites. Devayon Das made a “QueryCreator” and other improvements, and hopes to simplify its layout, make its interface easier to use, and add some features. And Ankit Garg worked on “Semantic Schemas”.

Congratulations to the students and their mentors.  Here’s hoping they’re all here to help out when next year’s interns roll in! :-)  And I’m looking forward to meeting Kevin and Salvatore, and introducing them to other Wikimedia & MediaWiki developers, at the New Orleans developers’ meetup next month.

Sumana Harihareswara
Volunteer Development Coordinator
Wikimedia Foundation

MediaWiki’s Google Summer of Code students halfway through projects

MediaWiki’s Google Summer of Code students have been busy! We’re more than halfway through the summer, so here’s what they’re up to:

Google Summer of Code logo 2011

MediaWiki is participating in Google Summer of Code 2011.

  • Akshay Agarwal’s “Account Creation, Login Screens and AJAX-ification of everything” (mentor: Brandon Harris). Code, project status.
    The last task I accomplished: “Added source tracking functionality in the account creation API that I am building.”
    Something I’ve learned: “True learning can happen only in an open environment & with a highly supportive community.”
  • Kevin Brown’s “Working Archival for Web References/Citations,” “to facilitate the archival of external links used as references in the English Wikipedia” (mentor: Neil Kandalgaonkar). Code, project notes.
    The last task I accomplished: “Adding support for wget local archival, currently working on feed for external archival services.”
    Something I’ve learned: “Where do I start? A lot. I think the biggest thing is probably managing a large project and time management, which I still have a lot to learn on.”
  • Devayon Das’s “Improving Semantic Search/Semantic Query usability issues in SMW” (mentor: Markus Krötzsch). Code, project notes.
    The last task I accomplished: “Added RSS links to the results generated by the Query Creator interface I’m building.”
    Something I’ve learned: “A 30 second chat with a community member can save you 30 minutes of scratching your head in frustration.”
  • Ankit Garg’s “Semantic Schemas extension” (mentor: Yaron Koren). Code.
    The last task I accomplished: “I finished adding the inheritance support to the PageSchema XML structure.”
    Something I’ve learned: “I have a learned a great deal of PHP; also how to manage a huge project.”
  • MediaWiki logo

    "A 30 second chat with a community member can save you 30 minutes of scratching your head in frustration."


    Salvatore Ingala’s “AMICUS: Awesome Monolithic Infrastructure for Customization of User Scripts” (mentors: Max Semenik and Brion Vibber). Code, project notes.
    The last task I accomplished: “I made a prototypal user interface for editing preferences of an existing gadget, HotCat.”
    Something I’ve learned: “Unit testing is boooooring, but ends up saving you a lot of time!”
  • Yuvi Panda’s “Making Offline Wikipedia Article Selection Easier with Mediawiki Extensions” (mentor: Arthur Richards). Code, project.
    The last task I accomplished: “Filter articles based on name, quality and importance.”
    Something I’ve learned: “That spending time talking to everyone involved in the process from start to finish (devs, community maintainers, etc.) saves a truckload of time later on.”
  • Zhenya Vlasyenko’s “MediaWiki Extension: SocialProfile – UserStatus feature” (mentor: Jack Phoenix). Code.
    The last task I accomplished: “Internalization of the UserStatus feature with the help of the MakeGlobalVariablesScript hook.”
    Something I’ve learned: “I’ve found out for myself a new ways of data interaction between PHP and Javascript… Convinced that knowing some tricks and hooks can greatly save time.”

Aigerim Karabekova, who was working on extension release management, ran into several delays (including medical issues) and the project has been dropped. We’re glad she made the attempt and wish her the best.

Continued best wishes to Zhenya, Yuvi, Salvatore, Ankit, Devayon, Kevin, and Akshay as they work to make MediaWiki, and the Wikimedia experience, better.  We’re glad to be helping young developers learn how to contribute to our community.

Sumana Harihareswara
Wikimedia Foundation, Volunteer Development Coordinator

Open source hackfest benefits WMF, community

On May 24th and 25th, the Wikimedia Foundation hosted a CiviCRM coding sprint in our San Francisco office. CiviCRM is the premier open source constituent relationship manager; WMF uses it to store donor and contribution information. Our CiviCRM database contains more than a million contact records and a million contribution records.

CiviCRM, The Free and Open Source Solution for the Civic Sector

The sprint was a terrific success. The eight participants squashed many CiviCRM bugs — and the Foundation directly benefited, as they improved CiviCRM contact/contribution search performance by 15 to 25 times! Formerly, it could take more than two minutes for someone to search among the contribution records. The developers’ tweaks, hacks and patches whittled that down to about 4-6 seconds per search. This will save innumerable hours for WMF administrators and fundraisers.

The Foundation’s Arthur Richards, a fundraising engineer, enthused: “Any software tool, open source or not, comes with headaches; the beauty of tools like CiviCRM is that we can solve our own problems. Thanks to having some great hackers in one place, we managed to mitigate one of our biggest CiviCRM pain points in a matter of hours.”

You can read more details about the sprint on Donald Lobo’s CiviCRM blog.

Richards was especially excited to “highlight how awesome it is working with other open source projects and using other open source tools. We get to scratch each other’s backs, which helps support a sustainable, healthy ecosystem of software/communities. Also, using open source tools like CiviCRM – while not without their (often big) pain points – is great because we can fix the software ourselves. While the tools are free to use, with a little bit of elbow grease and some resources, they can be molded and fixed to meet our needs much easier (and likely much cheaper) than relying on proprietary tools. Plus, the CiviCRM community has been instrumental in helping us troubleshoot, solve problems and add new features to meet our usage requirements.”

The CiviCRM community is planning to run another code sprint in the fall in Northern California; please contact them if you’d like to participate or even host it. In the meantime, Wikimedia and thousands of other nonprofits will enjoy the CiviCRM improvements developed in May.

-Sumana Harihareswara
Volunteer Development Coordinator, Wikimedia Foundation

GLAMCamp NYC leads to work on software, outreach, and more

Glam Camp NYC header dark

While GLAMCamp NYC finished on Sunday (Signpost coverage), the work initiated there will continue throughout the GLAM community.  Representatives from cultural institutions and Wikimedia chapters, as well as individuals, are working on several projects.  The projects concerning web badges for free culture allies, a metadata standard for use in the mass uploader/data ingestion tool, and the web analytics proposal are in particular seeking contributors and project managers; please comment at the coordination page to signal your interest.

Also available: the collaborative notes from Friday, Saturday, and Sunday, and specifically for discussion of the Ambassadors program, the Point Of Entry project, the data ingestion tool, and the metrics/analytics proposal.

Thanks to the organizers and participants for a productive and illuminating weekend.

-Sumana Harihareswara
Volunteer Development Coordinator, Wikimedia Foundation

GLAMCampNYC: help us make mass uploads easier

Today, several Wikimedians and representatives from galleries, libraries, archives and museums (GLAM institutions) met in New York City to kick off GLAMCampNYC.  New York City’s public Science, Industry, and Business Library is hosting the event.

Liam Wyatt, the Wikimedia Foundation’s Cultural Partnerships Fellow (aka GLAM fellow), introduced two keynoters: Meg Bellinger, discussing open access at Yale, and Maarten Zeinstra, presenting the Europeana public domain calculator.  The conference continues through Sunday.  Participants are discussing and building the GLAM outreach wiki, writing documentation, sharing best practices, and building tools.

Developers at GLAMCamp are developing a data-munging tool, based on pywikipediabot, to aid in mass uploads (more details).  According to Wyatt, the most common requests from GLAM institutions are (1) mass upload of audiovisual media and (2) metrics, “easily exportable statistics based on analytics on a GLAM’s relationship with Wikimedia.”  The data-munging or data ingestion tool will aid in the import of metadata from large sets of files, thus speeding the difficult part of mass uploads.  Attendees will be hacking on it in sprints this weekend, starting 3pm-4:30pm UTC time tomorrow, Saturday the 21st. Join them in person (11am local time), or in #glamwiki on Freenode.

See notes from today’s general talks and discussion and from the discussion of the GLAM Ambassadors program, or follow #glamwiki and #glamcamp on Twitter and Identi.ca.

-Sumana Harihareswara
Volunteer Development Coordinator, Wikimedia Foundation

MediaWiki selects eight students for Google Summer of Code 2011

We received more than 25 proposals for this year’s Google Summer of Code, and several mentors put many hours into evaluating project ideas, discussing them with applicants, and making the tough decisions.  Our final choices, the Google Summer of Code students for MediaWiki for 2011:

  • Akshay Agarwal‘s “Account Creation, Login Screens and AJAX-ification of everything” (mentor: Brandon Harris)
  • Kevin Brown’s “Working Archival for Web References/Citations,” “to facilitate the archival of external links used as references in the English Wikipedia” (mentor: Neil Kandalgaonkar)
  • Devayon Das‘s “Improving Semantic Search/Semantic Query usability issues in SMW” (mentor: Markus Krötzsch)
  • Ankit Garg‘s “Semantic Schemas extension” (mentor: Yaron Koren)
  • Salvatore Ingala‘s “AMICUS: Awesome Monolithic Infrastructure for Customization of User Scripts” (mentors: Brion Vibber and Max Semenik)
  • Aigerim Karabekova‘s “Extension Release Management” (mentors: Sam Reed, Priyanka Dhanda, and Chad Horohoe)
  • Yuvi Panda‘s “Making Offline Wikipedia Article Selection Easier with Mediawiki Extensions” (mentor: Arthur Richards)
  • Zhenya Vlasyenko‘s “MediaWiki Extension: SocialProfile – UserStatus feature” (mentor: Jack Phoenix)

You’ll be hearing more about each of these projects in the next few weeks!

Congratulations to this year’s students, and thanks to all the applicants, as well as MediaWiki’s many mentors, developers who evaluated applications, and Google’s Open Source Programs Office.  The accepted students now have a month to ramp up on MediaWiki’s processes and get to know their mentors (the Community Bonding Period) and will start coding their summer projects on or before May 23rd.  As organizational administrator for MediaWiki’s GSoC participation, I’ll be keeping an eye on all eight students and helping them out.

Good luck!

Project ideas, students, and mentors wanted for Google Summer of Code

For the sixth year in a row, Wikimedia is participating in the Google Summer of Code program. Google Summer of Code (GSoC) is a program where Google pays summer students USD 5000 each to hack open source projects during the summer (read more).

Over time, MediaWiki has benefited from GSoC students and their projects. For example, Samuel Lampa’s 2010 RDF import/export extension in Semantic MediaWiki is in use. And Jeroen De Dauw, GSoC student in 2009 and 2010, is now a persistently contributing member of the MediaWiki community, as is Brian Wolff, 2010 GSoC student.

In the past, the administrative and management challenges of GSoC have been an extra task that take engineers’ time, and too often fell through the cracks. So this year, Rob Lanphier asked me to act as organizational administrator for MediaWiki’s involvement, via the Wikimedia Foundation.

I’m recruiting students to apply, getting project ideas, and managing the application process overall. Once we choose the students and they start ramping up and working, I will also help mentors manage their students and keep communication going, to make sure that every GSoC student’s project gets delivered and gets used!

We hope 2011′s students will develop useful chunks of MediaWiki (core, extensions, gadgets, scripts, or utilities), help us get their code shipped, and stay in the MediaWiki community afterwards.

This year’s ideas include writing and implementing cite templates in a PHP extension, improving the ImageTagging extension, XML dump work, pre-commit checks in our code repositories, and more. And of course we want to hear your own ideas, too! Interested?

University, community college, and graduate students around the world are eligible to apply to Google Summer of Code. You don’t need to be a computer science or IT major, and you can work from home.

We are looking for students who already know PHP. It’s also great if you have some experience with LAMP, MAMP, LAPP, or one of those kinds of stacks, and with the Subversion version control system. If you haven’t contributed to MediaWiki before, How to become a MediaWiki hacker is a good place to start.

If you’d like to participate, check out the timeline. Make sure you are available full-time from 23 May till 22 August this summer, and have a little free time from 25 April till 23 May for ramp-up.

If you’re interested, please sign up on our wiki page and start talking with us on IRC in #mediawiki on Freenode about a possible project! Then you can submit your proposal via the official GSoC website. The deadline for you to submit a project proposal is April 8th, but we encourage you to start early and talk with us about your idea first.

And, to repeat what Brion once said:

If you’re an experienced MediaWiki developer and would like to help out with selecting and mentoring student projects, please give us a shout! We’ll take you even if you live in the southern hemisphere. ;) We need folks who’ll be available online fairly regularly over the summer and are knowledgeable about MediaWiki — not necessarily knowing every piece of it, but knowing where to look so you can help the students help themselves.

We’re looking forward to hacking with you!

Sumana Harihareswara
MediaWiki Coordinator, GSoC 2011