Wikimedia blog

News from the Wikimedia Foundation and about the Wikimedia movement

Posts Tagged ‘announcement’

Adding guided tours to Wikipedia

One of the great strengths of Wikipedia is that community members can employ the same tool used to write the encyclopedia – a wiki – for collaborating on documentation about the project. The downside of this approach is that these pages, written by encyclopedists, tend to be broad and extremely detailed. New contributors to Wikipedia face a daunting list of thousands of help pages, policies, guidelines, and best practices that have developed over our 12-year history.

Today, we’re happy to announce interactive guided tours, a new software feature that will enable Wikipedia editors and readers to learn about the project in a way that is much easier to digest. Wiki-based documentation can now be complemented by concise, step-by-step instructions presented via tooltips. Instead of simply describing a process, we can show you how to complete it yourself, and when you’ve seen enough, you can dismiss a tour instantaneously.

Screen Shot 2013-01-31 at 2.58.10 PM

A step from our new tour introducing first time contributors to the mechanics of editing Wikipedia.

Our team, editor engagement experiments, has built two Wikipedia tours to go along with today’s launch. First, a meta-tour to show community members how they operate, and second, a tour associated with our experimental “getting started” workflow, which helps people who’ve just registered make their first edit to the encyclopedia. You can see the test tour for yourself, right now.

We’ll be adding more tours soon, including in languages other than English, but Wikipedians won’t need to wait for us. Using simple JavaScript, community members can build their own tours, empowering each Wikipedia to create content that fits their particular use case. For Wikipedians and developers interested in creating guided tours, be sure to check out our project page. Our implementation of guided tours uses a version of the open source Guiders.js, and we’re happy to say that we’ve contributed back upstream to the original library as we’ve adapted Guiders to our needs.

Today’s release is just the first step toward experimenting with guided tours. We hope that in the weeks to come, we can find more areas of the encyclopedia where these tours can provide simplicity and clarity to the process of learning how to contribute to Wikipedia.

Steven Walling, Associate Product Manager

Google Summer of Code students reach project milestones

This year, the MediaWiki community again participated in Google Summer of Code, in which we selected nine students to work on new features or specific improvements to the software. They were sponsored by Google and mentored by experienced developers, who helped them become part of the development community and guided their code development.

Congratulations to the eight students who have made it through the summer of 2012 (our seventh year participating in GSoC)! They all accomplished a great deal, and many of them are working to improve their projects to benefit the Wikimedia community even more.

Google Summer of Code 2012

Eight students passed MediaWiki’s GSoC program in 2012.

  • Ankur Anand worked on integrating Flickr upload and geolocation into UploadWizard. WMF engineer Ryan Kaldari mentored Ankur as they made it easier for Wikimedia contributors to contribute media files and metadata. Read his wrapup and anticipate the merge of his code into the main UploadWizard codebase.
  • Harry Burt worked on TranslateSvg (“Bringing the translation revolution to Wikimedia Commons”). When his work is complete and deployed, we will more easily able to use a single picture or animation in different language wikis. See this image of the anatomy of a human kidney, for example; it has a description in eight languages, so it benefits multiple language Wikipedias (e.g., Spanish and Russian). Harry aims to allow contributors to localize the text embedded within vector files (SVGs), and you can watch a demo video, try out the test site, or just read Harry’s wrapup post. WMF engineer Max Semenik mentored this project.
  • Akshay Chugh worked on a convention/conference extension for MediaWiki. Wikimedia conferences like Wikimania often use MediaWiki to help organize their conferences, but it takes a lot of custom programming. Under the mentorship of volunteer developer Jure Kajzer, Akshay created the beta of an extension that a webmaster could install to provide conference-related features automatically. See his wrapup post.
  • Flow diagram for Ashish Dubey's project

    Ashish is working on the architecture that will support real-time collaboration.

    Ashish Dubey worked on realtime collaboration in the upcoming Visual Editor (you may have seen “real-time collaborative editing” in tools like Etherpad and Google Docs). Ashish (with WMF engineer Trevor Parscal as mentor) has implemented a collaboration server and other features (see his wrapup post) and has achieved real-time “spectation,” in which readers can see an editor’s changes in realtime. Wikimedia Foundation engineers plan to integrate Ashish’s work into VisualEditor around April to June 2013.

  • Nischay Nahata optimized the performance of the Semantic MediaWiki extension. In wikis with unusually large amounts of content, Semantic MediaWiki experiences performance degradation. With the mentorship of head Semantic MediaWiki developer Markus Krötzsch (a volunteer) and Wikidata developer Jeroen De Dauw, Nischay found and fixed many of these issues.  This also reduces SMW’s energy consumption, making it greener. Nischay’s work will be in Semantic MediaWiki 1.8.0, which is currently in beta and due to be released soon. Wikimedia Labs uses Semantic MediaWiki and will benefit from the performance improvements.
  • Proposal to redesign the MediaWiki watchlist

    Arun Ganesh illustrated the watchlist redesign proposal with this mockup.

    Aaron Pramana worked on watchlist grouping and workflow improvements. Aaron wants to make it easier for wiki editors and readers to use watchlists, and to create and use groups of watched items to focus on or share. Aaron worked with volunteer developer Alex Emsenhuber. The back end of the system is done, but Aaron wants your input about the user interface. Folks on the English Wikipedia’s Village Pump have started discussing it.

  • Robin Pepermans worked on Incubator improvements and language support, mentored by WMF engineer Niklas Laxström. If you’ve ever thought of using Wikimedia’s Incubator for new projects, it’s now easier to get started. Read Robin’s wrapup post for more.
  • Platonides worked on a desktop application for mass-uploading files to Wikimedia Commons. The application will eventually make it much easier for participants in upload campaigns like Wiki Loves Monuments to upload their photos (and it’ll work on Windows, Linux, and Mac OS). I mentored Platonides, who delivered a beta version.

As further progress happens, we’ll update our page about past GSoC students. Congratulations again to the students and their mentors. And thanks to volunteer Greg Varnum, who helped me administer this year’s GSoC, and to all the staffers and volunteers who helped students learn our ways.

Sumana Harihareswara, Engineering Community Manager

Improving the accuracy of the active editors metric

We are making a change to our active editor metric to increase accuracy, by eliminating double-counting and including Wikimedia Commons in the total number of active editors. The active editors metric is a core metric for both the Wikimedia Foundation and the Wikimedia communities and is used to measure the overall health of the different communities. The total number of active editors is defined as:

the number of editors with the same registered username across different Wikimedia projects who made at least 5 edits in countable namespaces in a given month and are not registered as a bot user.

This is a conservative definition, but helps us to assess the size of the core community of contributors who update, add to and maintain Wikimedia’s projects.

The de-duplication consists of two changes:

  1. The total active editor count now includes Wikimedia Commons (increasing the count).
  2. Editors with the same username on different projects are counted as a single editor (decreasing the count).

The net result of these two changes is a decrease of the number of total active editors averaging 4.4% over last 3 years.

De-duplication of the active editor count only affects our total number of active editors across the different Wikimedia projects, the counts within a single project are unaffected. We’ve also begun work on a data glossary as a canonical reference point for all key metrics used by the Wikimedia Foundation.

Background

(more…)

Wikimedia Foundation selects nine students for summer software projects

We received 63 proposals for this year’s Google Summer of Code, and several mentors put many hours into evaluating project ideas, discussing them with applicants and making the tough decisions. We’re happy to announce our final choices, the Google Summer of Code students for 2012:

MediaWiki logo

All nine of these students are working on MediaWiki, the software that powers Wikimedia sites.

Congratulations to this year’s students, and thanks to all the applicants, as well as MediaWiki’s many mentors, developers who evaluated applications, and Google’s Open Source Programs Office. The accepted students now have a month to ramp up on MediaWiki’s processes and get to know their mentors (the Community Bonding Period) and will start coding their summer projects on or before May 21st. As the organizational administrator for MediaWiki’s GSoC participation, I’ll be keeping an eye on all nine students and helping them out.

Good luck!

Sumana Harihareswara, Volunteer Development Coordinator

Google Summer of Code 2012

Google Summer of Code 2012

Project ideas, students, and mentors wanted to improve Wikimedia tech this summer

Google Summer of Code 2012

Google Summer of Code 2012

For the seventh year in a row, Wikimedia Foundation is participating in the Google Summer of Code program. Google Summer of Code (GSoC) is a program where Google pays summer students USD 5000 each to code for open source projects for three months (read more).

We hope 2012′s students will develop useful chunks of MediaWiki, help us get their code shipped, and fall in love with our community such that they stay with us for years to come.

This year’s project ideas include improvements to CentralNotice, taxobox editing, search, translation tools, and more.  Interested?

University, community college, and graduate students around the world are eligible to apply to Google Summer of Code. You don’t need to be a computer science or IT major, and you can work from home.

MediaWiki logo

MediaWiki is the Wikimedia Foundation's key open source project, powering Wikipedia and our other sites.

We are looking for students who already know some PHP. We also strongly prefer for you to have some experience working with Linux, Apache, and MySQL environments, and with the Git version control system. If you haven’t contributed to MediaWiki before, How to become a MediaWiki hacker is a good place to start; we will strongly prefer candidates who submit patches before the April 6th GSoC application deadline.

If you’d like to participate, check out the timeline. Make sure you are available full-time from 21 May till 20 August 2012, and have a little free time from 23 April till 20 May for ramp-up. Please read our wiki page and start talking with us on IRC in #mediawiki on Freenode about a possible project.  Then you’ll write a proposal and submit it via the official GSoC website. The deadline for you to submit a project proposal is April 6th, but we encourage you to start early and talk with us about your idea first.

We’re also seeking experienced MediaWiki developers anywhere in the world to help select and mentor student projects. We’ll take you even if you live in the southern hemisphere and it’s not summer for you. :-) You’ll need to be available online consistently so you can respond to student questions between now and late August. As Brion Vibber put it, if you “are knowledgeable about MediaWiki — not necessarily knowing every piece of it, but knowing where to look so you can help the students help themselves” then please consider helping out.

I’m administering our participation in GSoC. So I am encouraging students to apply, getting project ideas, and managing the application process overall. I look forward to seeing students discover the joy of collaborative work that improves the Wikimedia experience for millions of users. Help us spread the word.

Sumana Harihareswara
Volunteer Development Coordinator, Wikimedia Foundation
MediaWiki Coordinator, GSoC 2012

US Cultural Partnerships Coordinator: Lori Byrd Phillips

Lori Phillips (CC-by-sa by Lori Phillips)

The Wikimedia Foundation is pleased to announce Lori Byrd Phillips as the United States Cultural Partnerships Coordinator in 2012. Through this new position within the Global Development department, the US Cultural Partnerships Coordinator will lead in building the infrastructure needed to support the growing interest in Wikimedia partnerships among cultural institutions in the United States, ultimately working to make cultural partnerships in the US self-sustaining starting 2013.

Thanks to the efforts of the global GLAM-Wiki initiative over the past two years, much inspired and aided by Liam Wyatt’s Wikimedia GLAM Fellowship, just now coming to its scheduled end, professionals from galleries, libraries, archives and museums (GLAMs) have begun to seriously discuss partnership with Wikimedia as a means to increase accessibility to cultural resources, and to draw new audiences to their collections. Significant press about partnerships at respected institutions such as the British Museum [NY Times], the National Archives and Records Administration [Yahoo!], and the Smithsonian Institution [Chronicle of Philanthropy] has led cultural professionals to consider Wikimedia partnerships a cutting-edge trend. This resulted in demand from museums and other institutions to establish relationships with Wikimedia through Wikipedians in Residence and other projects. In the US, however, this growing interest from cultural institutions is quickly outpacing the current capacity of the present volunteer community to support these needs.

Interest is continuing to explode in the US, with plans for grant projects and for Wikimedia-museum partnerships to be featured in a number of upcoming conferences, most significantly a dedicated panel discussion at the American Association of Museums annual conference and Museum Expo.

While there is much interest among US Wikimedians to assist with cultural partnerships, a systematic structure is needed to connect these volunteers with cultural institutions and to provide the resources needed to establish successful partnerships. In order to accomplish this, the priorities of the Coordinator’s one-year project include: (more…)

Do It Yourself Analytics with Wikipedia

As you probably know, we publish on a regular basis backups of the different Wikimedia projects, containing their complete editing history. As time progresses, these backups grow larger and larger and become increasingly harder to analyze. To help the community, researchers and other interested people, we have developed a number of analytic tools to assist you in analyzing these large datasets. Today, we want to update you about these new tools, what they do and where you can find them. And please remember they are all still in development:

  • Wikihadoop
  • Diffdb
  • WikiPride

Wikihadoop

Wikihadoop makes it possible to use MapReduce jobs using Hadoop on the compressed XML dump files. What this means is that we can embarrassingly easy parallelize the processing of our XML files and this means that we don’t have to wait for days or weeks to finish a job.

We used Wikihadoop to create the diffs for all edits from the English XML dump that was generated in April of this year.

DiffDB

DiffIndexer and DiffSearcher are the two components of the DiffDB. The DiffIndexer takes as raw input the diffs generated by Wikihadoop and creates a Lucene-based index. The DiffSearcher allows you to query the index so you can answer questions such as:

  • Who has added template X in the last month?
  • Who added more than 2000 characters to user talk pages in 2008?

WikiPride

Volume of contributions by registered users on the English Wikipedia until December 2010, colored by account age

Finally, WikiPride allows you to visualize the breakdown of a Wikipedia community by age of account and by the volume of contributed content. You need a Toolserver account to run this, but you will be able to generate cool charts.

If you are having trouble getting Wikihadoop to run, then please contact me at dvanliere at wikimedia dot org and I am happy to point you in the right direction! Let the data crunching begin!

Diederik van Liere, Analytics Team

New comparative study to re-examine the quality and accuracy of Wikipedia

Much of Wikipedians’ efforts is devoted to ensuring the quality of the encyclopedia they are producing collaboratively – the community is constantly working to improve it. The effectiveness of this work has been recognized many times, perhaps most notably in a study published in 2005 by the scientific journal Nature which compared entries in the English Wikipedia with those in the online edition of Encyclopaedia Britannica. Nature reported four errors per Wikipedia entry and three per Encyclopaedia Britannica entry, a result that is still widely cited today even though Wikipedia is now more than twice as old, having matured in many ways.

The Wikimedia Foundation has commissioned a new small-scale study to examine the quality and accuracy of Wikipedia articles. This study, currently being undertaken by Epic, a UK-based e-learning company, and Oxford University, employs greater rigor than the Nature study, involves academics and scholars, and will examine more than just English language entries, and subjects other than solely science. Our hope is that the study’s findings will inspire and inform more extensive, independently funded research related to the quality of information found in Wikipedia and other free knowledge projects.

This project will explore methods to define a baseline for the quality of Wikipedia entries and to help the community identify shortcomings, as well as strategies to address them. Wikipedia has several advantages over commercially available online encyclopedias – it is freely accessible to hundreds of millions of users worldwide, it is available in over 270 languages, and it is updated at remarkable speed, relying on the ability of a vast number of non-paid contributors rather than the academic credentials of a few paid experts. However, errors do exist and concerns have been raised that articles may be colored by contributors’ personal opinions or misunderstandings. A comparative analysis of the quality of Wikipedia’s articles and other popular alternatives is crucial to identifying avenues for improvement.

Dario Taraborelli, Senior Research Analyst, Strategy

Tilman Bayer, Movement Communications

Announcing the WikiChallenge Winners

Wikipedia Participation Challenge

Over the past couple of months, the Wikimedia Foundation, Kaggle and ICDM organized a data competition. We asked data scientists around the world to use Wikipedia editor data and develop an algorithm that predicts the number of future edits, and in particular predicts correctly who will stop editing and who will continue to edit.

The response has been great! We had 96 teams compete, comprising in total 193 people who jointly submitted 1029 entries. You can have a look for yourself at the leaderboard.

We are very happy to announce that the brothers Ben and Fridolin Roth (team prognoZit) developed the winning algorithm. It is elegant, fast and accurate. Using Python and Octave they developed a linear regression algorithm. They used 13 features (2 are based on reverts and 11 are based on past editing behavior) to predict future editing activity. Both the source code and the wiki description of their algorithm are available. Congratulations to Ben and Fridolin!

Second place goes to Keith Herring. Submitting only 3 entries, he developed a highly accurate model, using random forests, and utilizing a total of 206 features. His model shows that a randomly selected Wikipedia editor who has been active in the past year has approximately an 85 percent probability of being inactive (no new edits) in the next 5 months. The most informative features captured both the edit timing and volume of an editor. Asked for his reasons to enter the challenge, Keith named his fascination for datasets and that

“I have a lot of respect for what Wikipedia has done for the accessibility of information. Any small contribution I can make to that cause is in my opinion time well spent.”

We also have two Honourable Mentions for participants who only used open source software. The first Honorable Mention is for Dell Zang (team zeditor) who used a machine learning technique called gradient boosting. His model mainly uses recent past editor activity.

The second Honourable Mention is for Roopesh Ranjan and Kalpit Desai (team Aardvarks). Using Python and R, they developed a random forest model as well. Their model used 113 features, mainly based on the number of reverts and past editor activity, see the wikipage describing their model.

All the documentation and source code has been made available, the main entry page is WikiChallenge on Meta.

What the four winning models have in common is that past activity and how often an editor is reverted are the strongest predictors for future editing behavior. This confirms our intuitions, but the fact that the three winning models are quite similar in terms of what data they used is a testament to the importance of these factors.

We want to congratulate all winners, as they have showed us in a quantitative way important factors in predicting editor retention. We also hope that people will continue to investigate the training dataset and keep refining their models so we get an even better understanding of the long-term dynamics of the Wikipedia community.

We are looking forward to use the algorithms of Ben & Fridolin and Keith in a production environment and particularly to see if we can forecast the cumulative number of edits.

Finally, we want to thank the Kaggle people for helping in organizing this competition and our anonymous donor who has generously donated the prizes.

Diederik van Liere
External Consultant, Wikimedia Foundation

Howie Fung
Senior Product Manager, Wikimedia Foundation

2011-10-26: Edited to correct description of the winning algorithm

Google Summer of Code students reach project milestones

Congratulations to the seven Google Summer of Code students who made it through the summer of 2011! They all accomplished a great deal, but want to continue contributing to ensure their work maximally benefits Wikimedia.

Google Summer of Code logo 2011

MediaWiki participated in Google Summer of Code 2011.

Yuvi Panda‘s assessment parsing/aggregating extension aims “to make it easier to select and export article selections for various offline collections.” Yuvi needs some code review and suggestions on how to improve it to meet the Foundation’s quality standards for deployability, as he wrote the developers’ mailing list.

Salvatore Ingala worked on making gadgets customizable. As he elaborated, that means:

  • “allowing gadgets to easily declare the list of configuration
    variables they have;
  • allowing users to easily change those settings, with an easy-to-use
    UI integrated to the Special:Preferences page.”

The next step is merging his code into trunk, which Salvatore’s planning with other MediaWiki developers.

Kevin Brown created the ArchiveLinks project to address the problem of linkrot on Wikipedia:

In articles we often cite or link to external URLs, but anything could happen to content on other sites — if they move, change, or simply vanish, the value of the citation is lost. ArchiveLinks rewrites external links in Wikipedia articles, so there is a ‘[cached]‘ link immediately afterwards which points to the web archiving service of your choice. This can even preserve the exact time that the link was added, so for sites which archive multiple versions of content (such as the Internet Archive) it will even link to a copy of the page that was made around the time the article was written.

Kevin’s next step: getting a security review of his code, getting a starter feed set up so that the Internet Archive can start archiving it, and campaigning to interest Wikimedians and thus eventually get consensus to turn it on. At least one Wikimedian has already praised Kevin for his work.

Akshay Agarwal wrote a MediaWiki extension, SignupAPI, that makes it easier for a new user to create an account. “This extension creates a special page that cleans up SpecialUserLogin from signup related stuff, adds an API for signup, adds sourcetracking for account creation & provides Ajax-ified validation for signup form.” Akshay’s waiting for code review and discussion before the project can move forward further and benefit Wikimedia users.

MediaWiki logo

Seven students contributed to various parts of MediaWiki, the wiki software that supports WMF sites.

Yuvi, Salvatore, Kevin, and Akshay all worked on features that they aim to get into Wikimedia Foundation-run wikis, such as Wikipedia, Wikisource, Wikinews, etc., sooner rather than later. In contrast, three students worked on extensions that will primarily benefit the larger MediaWiki community. For example, Yevhenii Vlasenko‘s project was a “UserStatus” feature for SocialProfile. The SocialProfile extension is not currently deployed on any WMF wikis, but will benefit several other MediaWiki administrators and users. Zhenya finished his work but would like to continue by integrating better with social networks.

And two students worked on Semantic MediaWiki, which is also not currently deployed on any Wikimedia Foundation sites. Devayon Das made a “QueryCreator” and other improvements, and hopes to simplify its layout, make its interface easier to use, and add some features. And Ankit Garg worked on “Semantic Schemas”.

Congratulations to the students and their mentors.  Here’s hoping they’re all here to help out when next year’s interns roll in! :-)  And I’m looking forward to meeting Kevin and Salvatore, and introducing them to other Wikimedia & MediaWiki developers, at the New Orleans developers’ meetup next month.

Sumana Harihareswara
Volunteer Development Coordinator
Wikimedia Foundation