Wikimedia blog

News from the Wikimedia Foundation and about the Wikimedia movement

Corporate

News about the Wikimedia Foundation Board of Trustees and staff.

Wikimedia Highlights, May 2013

Information For versions in other languages, please check the wiki version of this report, or add your own translation there!

Highlights from the Wikimedia Foundation Report and the Wikimedia engineering report for May 2013, with a selection of other important events from the Wikimedia movement

Wikimedia Foundation highlights

From the Fundraising report: The “facts” banner (listing some basic facts about Wikipedia) was tested in many different versions and eventually performed better than all previous fundraising banners

Fundraising report released

The Wikimedia Foundation’s fundraising team published a report from the 2012-2013 fundraiser. The report reviews the evolution of banner design and includes data about the 2012 year-end English campaign and the 2013 multilingual campaign, which raised a total $35 million USD from over 2 million donors.

Community invited to discuss trademark practices

The Legal and Community Advocacy (LCA) team published a statement on trademark practices, which requests community feedback on the Wikimedia trademark policy, procedure, and other questions. The objective is to balance the interest in licensing the brand for mission-aligned activities, with the necessity of preventing misuse and “naked licensing” (licensing without quality control). This is the opportunity to provide ideas as the team considers updating the trademark policy and practices.

Wikipedia Zero launches in Pakistan

Wikipedia Zero, the program to give people around the world mobile access to Wikipedia free of data charges, is now available in Pakistan, in partnership with Mobilink (Vimpelcom). The company’s user base of over 32 million people makes this the second largest Wikipedia Zero launch to date.

The “Nearby” feature in Vatican City. The camera icon (bottom) indicates an article which misses images, inviting users to contribute one.

“Nearby” feature shows Wikipedia articles in the reader’s vicinity

On location-aware devices (such as smartphones with GPS), a new “Nearby” page lists articles close to the reader’s current location. The feature is designed for mobile devices, but also works on the desktop version of Wikipedia.

Presentation slides with the Tool Labs logo

New hosting environment for community-developed tools

The Tool Labs, an environment for community developers to provide external software tools supporting work on Wikimedia projects, is now operating. With the support of the German Wikimedia chapter, many existing tools have already migrated from the Wikimedia Toolserver to Tools Labs.

Search for new Executive Director begins

The job opening for the Wikimedia Foundation’s new Executive Director has been posted. This starts the search for a successor for Sue Gardner, who will step down later this year. Board of Trustees chair Kat Walsh asked Wikimedians for help in finding the best possible candidate, by spreading the news in their networks.

Global unique visitors for April:

(more…)

Wikimedia Foundation Report, May 2013

Information You are more than welcome to edit the wiki version of this report for the purposes of usefulness, presentation, etc., and to add translations of the “Highlights” excerpts.

Global unique visitors for April:

517 million (-0.17% compared with March; +9.16% compared with the previous year)
(comScore data for all Wikimedia Foundation projects; comScore will release May data later in June)

Page requests for May:

21.0 billion (+0.8% compared with April; 16.5% compared with the previous year)
(Server log data, all Wikimedia Foundation projects including mobile access)

Active Registered Editors for April 2013 (>= 5 mainspace edits/month, excluding bots):

82,553 (+0.86% compared with March / +4.38% compared with the previous year)
(Database data, all Wikimedia Foundation projects.)

Report Card (integrating various statistical data and trends about WMF projects):

http://reportcard.wmflabs.org/

(Definitions)

Financials

Wikimedia Foundation YTD Revenue and Expenses vs Plan as of April 30, 2013

Wikimedia Foundation YTD Expenses by Functions as of April 30, 2013

(Financial information is only available through April 2013 at the time of this report.)

All financial information presented is for the Month-To-Date and Year-To-Date April 30, 2013.

Revenue $50,441,664
Expenses:
Engineering Group $11,909,113
Fundraiser Group $3,085,352
Grantmaking & Programs Group $7,894,416
Governance Group $630,123
Legal/Community Advocacy/Communications Group $2,560,446
Finance/HR/Admin Group $4,757,347
Total Expenses $30,836,797
Total surplus $19,604,867
  • Revenue for the month of April is $8.87MM versus plan of $9.78MM, approximately $908K or 9% under plan.
  • Year-to-date revenue is $50.44MM versus plan of $45.52MM, approximately $4.92MM or 11% over plan.
  • Expenses for the month of April is $5.78MM versus plan of $4.10MM, approximately $1.68MM or 41% over plan, primarily due to higher grant expenses (timing of FDC grants), legal fees, and personal property tax expenses partially offset by lower personnel expenses, internet hosting, and bank fees.
  • Year-to-date expenses is $30.84MM versus plan of $34.07MM, approximately $3.24MM or 9% under plan, primarily due to lower personnel expenses, capital expenses, internet hosting, and travel expenses partially offset by higher legal expenses, bank fees, outside contract services, and personal property tax expenses.
  • Cash position is $45.6MM as of April 30, 2013.

Highlights

From the Fundraising report: The “facts” banner (listing some basic facts about Wikipedia) was tested in many different versions and eventually performed better than all previous fundraising banners

Fundraising report released

The Wikimedia Foundation’s fundraising team published a report from the 2012-2013 fundraiser. The report reviews the evolution of banner design and includes data about the 2012 year-end English campaign and the 2013 multilingual campaign, which raised a total $35 million USD from over 2 million donors.

(more…)

Developing Distributedly, Part 1: Tools for Remote Collaboration

The mobile web engineering team at the Wikimedia Foundation confronts a significant question every day: How does a team stay synchronized and productive when teammates are scattered around the globe? Our team is highly distributed and effective communication is a challenge. We focus on some of the highest priority engineering areas for the WMF, and at any point our team may include members in California, Colorado, Arizona , Russia, India, Poland, and the United Kingdom. To help us clear our geographic and temporal hurdles, we embrace a culture of continuous improvement in the ways we engage with one another and our work.

For many, having a collocated team is critical for success. We, on the other hand, see big advantages in our geographic distribution. The diversity of opinions, world experiences, and cultural knowledge that we can bring to the table through global distribution helps us better cope with some of the challenges we face in developing easy-to-use and accessible tools for a global audience. Of course, we might achieve this by generally hiring people from around the world, but the pool of exceptional recruits is much bigger when you don’t have a relocation requirement.

Similarly, having team members in other locales gives us greater exposure to people using our products on devices you might not commonly find in the US. Our geographic distribution also gives us closer to around-the-clock coverage in case of emergencies. Finally, building support for working remotely into the team gives us all a huge amount of freedom; any of us can travel (for business, pleasure, emergencies, etc) and not have to worry about falling out of sync with the rest of the team.

We make it all work with a multitude of communication tools, organizational practices, and discipline. While sometimes it can feel like there is no real substitute for face-to-face communication, our approach to these challenges has given us great strength, freedom, and resilience. The list of tools we use is long, and many of them are useful even for collocated teams. I will cover a few of the most crucial in detail below.

Video conferencing

The mobile web team using Google Hangout for a regular meeting where nearly every participant is in a different location.

Due in part to its simplicity, we rely on Google Hangouts for all of our regular meetings and for the occasional ad-hoc meeting where asynchronous communication will not do. In order to be effective, it’s important for video chat tools to stay out of the way so we can focus on communicating rather than wrestling with technical complications. Of the many tools we use, only one closely approximates in-person, face-to-face communication: group video chat. It’s the highest-bandwidth (yeah, pun intended) tool at our disposal and when used right, it can make all of the distance between us melt away.

Hangouts does this very well; it highlights the video of the person speaking, mutes your microphone when you’re typing, and screen-sharing is very simple. Audio and video quality adjusts automatically based on your bandwidth, and toggling your camera or mic on/off is trivial. The ‘effects’ feature lets you dress up participants in things like pirate hats and mustaches, making meetings more fun and helping the miles between participants disappear. Oh, and it’s free if you use GMail or Google Apps.

We have been iterating to find the best hardware setup for supporting Google Hangouts in the office in as unobtrusive a way as possible. While the mobile web team and other distributed teams have been working hard for improvements, the IT department has been doing an outstanding job helping to find and implement what works. We’ve dedicated one conference room to be our test subject and we are very close to finding an excellent set up. It’s unclear whether this will work well in our other conference rooms (do not underestimate how challenging room acoustics can be!), but we have come a long way from wasting half of our meeting time trying to get the technology to work for remote teammates.

(more…)

Wikimedia Highlights, April 2013

Information For versions in other languages, please check the wiki version of this report, or add your own translation there!

Highlights from the Wikimedia Foundation Report and the Wikimedia engineering report for April 2013, with a selection of other important events from the Wikimedia movement

Wikimedia Foundation highlights

Screenshot: This user has received four new notifications

New notifications system launches on the English Wikipedia

On April 30, the Foundation’s Editor Engagement Team activated the Notifications system (also known as Echo) on the English Wikipedia. It inform editors about new activity on the wiki that affects them, for example when they receive a message on their user page, or when one of their edits is reverted, or when another user thanks them for an edit. The Notifications system is still being tested and modified based on community feedback. Later, it will be made available on other language versions of Wikipedia, and on other Wikimedia projects.

Family tree of the composer Johann Sebastian Bach, generated from Wikidata by the “GeneaWiki” tool

Wikidata’s content can now be used on all Wikipedias

Wikidata has now begun to serve all language versions of Wikipedia as a common source of structured data that can be used any Wikipedia article, e.g. in infoboxes. Wikidata’s machine-readable knowledge database already contains over 11 million items. They can also be queried, evaluated and edited with the help of a growing collection of tools.

Wikimedia Commons app announced for iOS and Android

The Wikimedia Commons mobile app was officially announced in April, allowing quick and easy image uploads directly from mobile devices. It is available for both iOS and Android devices. Mobile users who don’t have the app installed can still upload images from the mobile web version of all Wikimedia projects.

Funds Disseminations Committee (FDC) publishes its recommendations about $1.2 million

Launched in 2012, the Funds Dissemination Committee is a group of Wikimedia volunteers tasked with helping to decide about the use of movement funds. On April 28, after a period of evaluation by supporting staff and public review by the community, it published its recommendations to the Wikimedia Board of Trustees about the second round of funding requests, by four organizations: Wikimedia Czech Republic (requested: $14,084.50, recommended: $0), Wikimedia Hong Kong (requested: $211,660.26, recommended: $0), Wikimedia Norway (requested: $235,715, recommended: $140,000), and Wikimedia France (requested: $747,259, recommended: $525,000).

New conflict of interest guidelines

With an immediate effective date, the Board of Trustees approved Wikimedia guidelines on the disclosure of potential and actual conflicts of interest in requesting and allocating movement resources (such as financial grants). Input provided by the community during a six-week consultation period greatly improved the original version.

Global unique visitors for March:

(more…)

Wikimedia Foundation Report, April 2013

Information You are more than welcome to edit the wiki version of this report for the purposes of usefulness, presentation, etc., and to add translations of the “Highlights” excerpts.

Global unique visitors for March:

517 million (+7.17% compared with February; +5.76% compared with the previous year)
(comScore data for all Wikimedia Foundation projects; comScore will release April data later in May)

Page requests for April:

20.8 billion (-3.4% compared with March; +20.1% compared with the previous year)
(Server log data, all Wikimedia Foundation projects including mobile access)

Active Registered Editors for March 2013 (>= 5 mainspace edits/month, excluding bots):

82,105 (+5.67% compared with February / +2.15% compared with the previous year)
(Database data, all Wikimedia Foundation projects.)

Report Card (integrating various statistical data and trends about WMF projects) for March 2013:

http://reportcard.wmflabs.org/

(Definitions)

Financials

Wikimedia Foundation YTD Revenue and Expenses vs Plan as of March 31, 2013

Wikimedia Foundation YTD Expenses by Functions as of March 31, 2013

(Financial information is only available through March 2013 at the time of this report.)

All financial information presented is for the Month-To-Date and Year-To-Date March 31, 2013.

Revenue $41,573,672
Expenses:
Engineering Group $10,569,516
Fundraising Group $2,915,969
Grantmaking & Programs Group $4,565,386
Governance Group $555,937
Legal/Community Advocacy/Communications Group $2,263,477
Finance/HR/Admin Group $4,187,115
Total Expenses $25,057,400
Total surplus $16,516,272
  • Revenue for the month of March is $5.92MM versus plan of $5.28MM, approximately $647K or 12% over plan.
  • Year-to-date revenue is $41.57MM versus plan of $35.74MM, approximately $5.83MM or 16% over plan.
  • Expenses for the month of March is $3.09MM versus plan of $4.04MM, approximately $943K or 23% under plan, primarily due to lower personnel expenses, capital expenses, internet hosting, and grant expenses offset by higher bank fees.
  • Year-to-date expenses is $25.06MM versus plan of $29.97MM, approximately $4.92MM or 16% under plan, primarily due to personnel expenses, capital expenses, internet hosting, FDC grants executed, WMF project grants, and travel expenses partially offset by higher legal expenses and bank fees.
  • Cash position is $41.02MM as of March 31, 2013.

Highlights

Screenshot: This user has received four new notifications

New notifications system launches on the English Wikipedia

(more…)

Kicking off the search for our next Executive Director

Today we launch our search for the next Executive Director of the Wikimedia Foundation.

About six weeks ago, the Wikimedia Foundation’s Executive Director Sue Gardner told us she will be stepping down from her role. Happily, she is staying on until we find her successor, and we are now launching that search.

It will be a challenge to find someone who is able to fill Sue’s shoes, but I am glad to say that the Board of Trustees, Sue and the senior staff of the Wikimedia Foundation are aligned in our quest for a successor who will build on Sue’s considerable accomplishments, and steer the Wikimedia Foundation toward even greater success in the future.

The Wikimedia Foundation is the internationally-active San Francisco-based non-profit organization that operates Wikipedia, the free encyclopedia. It supports a global community of tens of thousands of volunteers in collecting, developing, and making the sum of all the world’s knowledge freely available. Over half a billion people use Wikipedia and its sister projects every month. We are the fifth most popular website in the world, and the only donor-supported site in the top 100. We’re widely recognized as the most influential and important organization in the free knowledge movement.

Our Executive Director reports to the Board of Trustees and acts in partnership with the global volunteer community, providing the leadership and setting the strategy for the Wikimedia Foundation, while managing its day-to-day operations and activities. The Executive Director is responsible for modernizing the user experience and nurturing, growing and diversifying the community of people who write our projects. He or she also ensures our grantmaking supports innovation across the Wikimedia movement and enables contributor growth in underrepresented demographics and geographies.

Our Executive Director needs to understand and advance the Wikimedia movement’s core values. They need to have proven management skills in technology and product development in order to effectively lead a high-traffic website, and have experience designing and implementing planning processes with a high built-in assumption of fast and iterative change. He or she will need to have exceptional communication skills, and possess both a drive to achieve transformative results and a deep respect for collaborative processes. The Executive Director’s ability to effect change in partnership with Wikimedia’s community will be decisive not just to their success, but to Wikimedia’s lasting impact.

It’s impossible to know where our next Executive Director will come from: there is no career path that makes running the Wikimedia Foundation somebody’s obvious next step. The right person might or might not currently work at a big web site. They might or might not be in the non-profit sector. They could have a background in education, or product development, or media, or community development, or something entirely different. They may live in the United States, or outside it. In this search, we want to cast a wide net for candidates, so that we can find the person with the rare mix of skills, experiences and values needed for this important role.

If you’re reading this post you know how much the work of the Wikimedia Foundation matters. I’m asking you for your help in spreading the news of this unique opportunity. Please share this post widely in your networks.

For more information, to suggest potential candidates or to put yourself forward, please write to info@moppenheim.com.

Some details on the recruitment:

  • We have retained the search firm m/Oppenheim Associates to assist in finding and screening candidates. We’ve worked successfully with m/Oppenheim in the past to fill senior roles at the Foundation. They know us well, and we trust they’ll do a great job with this hire.
  • The full position description is available on the Wikimedia Foundation site,  hosted at jobs.wikimedia.org.
  • The hiring process will unfold over the next three to six months; we hope to have a new Executive Director in place by October. That said, we’re going to take the time we need to find the best possible candidate. We are glad to restate that our current Executive Director, Sue Gardner, will stay with us throughout the recruitment process until we have a new Executive Director in place.
  • Following initial screening of the candidates a short-list of applicants will be interviewed by Board members and members of the senior staff, and we will encourage them to get involved with the Wikimedia community (if they aren’t already) to learn more about our movement. (We would also encourage anyone interested in the role to take a look at our guiding principles, or to pick up one of the books documenting and describing the Wikimedia movement.)
  • We’ve set up some pages on meta wiki, the central collaboration wiki, where Wikimedia community members can find more information and also get involved in a public discussion about the role and the recruitment process.

Thanks in advance for helping spread the word about this rare and important opportunity.

Kat Walsh
Chair, Board of Trustees, Wikimedia Foundation

What’s missing from the media discussions of Wikipedia categories and sexism

Last week the New York Times published an Op-Ed from author Amanda Filipacchi headlined Wikipedia’s Sexism Toward Female Novelists, in which she criticized Wikipedia for moving some authors from the “American novelists” category into a sub-category called “American women novelists.” Because there is no subcategory for “American male novelists,” Filipacchi saw the change as reflecting a sexist double standard, in which ‘male’ is positioned as the ungendered norm, with ‘female’ as a variant.

I completely understand why Filipacchi was outraged. She saw herself, and Harper Lee, Harriet Beecher Stowe, Judy Blume, Louisa May Alcott, Mary Higgins Clark, and many others, seemingly downgraded in the public record and relegated to a subcategory that she assumed would get less readership than the main one. She saw this as a loss for American women novelists who might otherwise be visible when people went to Wikipedia looking for ideas about who to hire, to honor, or to read.

In the days following, other publications picked up the story, and Filipacchi wrote two followup pieces — one describing edits made to her own biography on Wikipedia following her first op-ed, and another rebutting media stories that had positioned the original categorization changes as the work of a lone editor.

For me–as a feminist Wikipedian–reading the coverage has been extremely interesting. I agree with many of the criticisms that have been raised (as I think many Wikipedians do), and yet there are important points that I think have been missing from the media discussions so far.

In Wikipedia, like any large-scale human endeavor, practice often falls short of intent.

Individuals make mistakes, but that doesn’t and shouldn’t call into question the usefulness or motivations of the endeavor as a whole. Since 2011, Wikipedia has officially discouraged the creation of gender-specific subcategories, except when gender is relevant to the category topic. (One of the authors of the guideline specifically noted that it is clear that any situation in which women get a gendered subcategory while men are left in the ungendered parent category is unacceptable.) In other words, the very situation Filipacchi decries in her op-ed has been extensively discussed and explicitly discouraged on Wikipedia.

Wikipedia is a continual work-in-progress. It’s never done.

In her original op-ed, Filipacchi seems to assume that Wikipedians are planning to move all the women out of the American Novelists category, leaving all the men. But that’s not the case. There’s a continuous effort on Wikipedia to refine and revise categories with large populations, and moving out the women from American Novelists would surely have been followed by moving out the satirical novelists, or the New York novelists, or the Young Adult novelists. I’d argue it’s still an inappropriate thing to do, because women are 50 percent of the population, not a variant to the male norm. Nevertheless the move needs to be understood not as an attack on women, but rather, in the context of continuous efforts to refine and revise all categories.

Wikipedia is a reflection of the society that produces it.

Wikipedia is the encyclopedia anyone can edit, and as such it reflects the cultural biases and attitudes of the general society. It’s important to say that the people who write Wikipedia are a far larger and vastly more diverse group than the staff of any newsroom or library or archive, past or present. That’s why Wikipedia is bigger, more comprehensive, up-to-date and nuanced, compared with any other reference work. But with fewer than one in five contributors being female, gender is definitely Wikipedia’s weak spot, and it shouldn’t surprise anyone that it would fall victim to the same gender-related errors and biases as the society that produces it.

Are there misogynists on Wikipedia? Given that anyone with internet access can edit it, and that there are roughly 80,000 active editors (those who make at least 5 edits per month on Wikimedia projects), it would be absurd to claim that Wikipedia is free of misogyny. Are there well-intentioned people on Wikipedia accidentally behaving in ways that perpetuate sexism? Of course. It would be far more surprising if Wikipedia were somehow free of sexism, rather than the reverse.

Which brings me to my final point.

It’s not always the case, but in this instance the system worked. Filipacchi saw something on Wikipedia that she thought was wrong. She drew attention to it. Now it’s being discussed and fixed. That’s how Wikipedia works.

The answer to bad speech is more speech. Many eyes make all bugs shallow. If you see something on Wikipedia that irks you, fix it. If you can’t do it yourself, the next best thing is to do what Filipacchi did — talk about it, and try to persuade other people there’s a problem. Wikipedia belongs to its readers, and it’s up to all of us to make it as good as it possibly can be.

Sue Gardner, Executive Director, Wikimedia Foundation

The Wikidata revolution is here: enabling structured data on Wikipedia

The logo of Wikidata

A year after its announcement as the first new Wikimedia project since 2006, Wikidata has now begun to serve the over 280 language versions of Wikipedia as a common source of structured data that can be used in more than 25 million articles of the free encyclopedia.

By providing Wikipedia editors with a central venue for their efforts to collect and vet such data, Wikidata leads to a higher level of consistency and quality in Wikipedia articles across the many language editions of the encyclopedia. Beyond Wikipedia, Wikidata’s universal, machine-readable knowledge database will be freely reusable by anyone, enabling numerous external applications.

“Wikidata is a powerful tool for keeping information in Wikipedia current across all language versions,” said Wikimedia Foundation Executive Director Sue Gardner. “Before Wikidata, Wikipedians needed to manually update hundreds of Wikipedia language versions every time a famous person died or a country’s leader changed. With Wikidata, such new information, entered once, can automatically appear across all Wikipedia language versions. That makes life easier for editors and makes it easier for Wikipedia to stay current.”

The Wikidata entry on Johann Sebastian Bach (as displayed in the “Reasonator” tool), containing among other data the composer’s places of birth and death, family relations, entries in various bibliographic authority control databases, a list of compositions, and public monuments depicting him

The dream of a wiki-based, collaboratively edited repository of structured data that could be reused in Wikipedia infoboxes goes back to at least 2004, when Wikimedian Erik Möller (now the deputy director of the Wikimedia Foundation) posted a detailed proposal for such a project. The following years saw work on related efforts like the Semantic MediaWiki extension, and discussions of how to implement a central data repository for Wikimedia intensified in 2010 and 2011.

The development of Wikidata began in March 2012, led by Wikimedia Deutschland, the German chapter of the Wikimedia movement. Since Wikidata.org went live on 30 October 2012, a growing community of around 3,000 active contributors started building its database of ‘items’ (e.g. things, people or concepts), first by collecting topics that are already the subject of Wikipedia articles in several languages. An item’s central page on Wikidata replaces the complex web of language links that previously connected these articles about the same topic in different Wikipedia versions.

Wikidata’s collection of these items now numbers over 10 million. The community also began to enrich Wikidata’s database with factual statements about these topics (data like the mayor of a city, the ISBN of a book, the languages spoken in a country, etc.). This information has now become available for use on Wikipedia itself, and Wikipedians on many language Wikipedias have already started to add it to articles, or discuss how to make best use of it.

“It is the goal of Wikidata to collect the world’s complex knowledge in a structured manner so that anybody can benefit from it,” said Wikidata project director Denny Vrandečić. “Whether that’s readers of Wikipedia who are able to be up to date about certain facts or engineers who can use this data to create new products that improve the way we access knowledge.”

The next phase of Wikidata will allow for the automatic creation of lists and charts based on the data in Wikidata. Wikimedia Deutschland will continue to support the project with an engineering team that is dedicated to Wikidata’s second year of development and maintenance.

Wikidata is operated by the Wikimedia Foundation and its fact database is published under a Creative Commons 0 public domain dedication. Funding of Wikidata’s initial development was provided by the Allen Institute for Artificial Intelligence [AI]², the Gordon and Betty Moore Foundation and Google, Inc.

Tilman Bayer, Senior Operations Analyst, Wikimedia Foundation

More information available here:

Some of the first applications demonstrating the potential of Wikidata:

  • http://simia.net/treeoflife/ – a (still very incomplete) “tree of life” drawn from relations among biological species in Wikidata’s database
  • “GeneaWiki” generates a graph showing a person’s family relations as recorded in Wikidata, example: Bach family

Wikipedia Adopts MariaDB

This past Wednesday marked a milestone in the evolution of Wikimedia’s Database infrastructure: the completion of the migration of the English and German Wikipedias, as well as Wikidata, to MariaDB 5.5.

For the last several years, we’ve been operating the Facebook fork of MySQL 5.1 with most of our production environment running a build of r3753. We’ve been pleased with its performance; Facebook’s MySQL team contains some of the finest database engineers in the industry and they’ve done much to advance the open source MySQL ecosystem.

That said, MariaDB’s optimizer enhancements, the feature set of Percona’s XtraDB (many overlap with the Facebook patch, but I particularly like add-ons such as the ability to save the buffer pool LRU list, avoiding costly warmups on new servers), and of Oracle’s MySQL 5.5 provide compelling reasons to consider upgrading. Equally important, as supporters of the free culture movement, the Wikimedia Foundation strongly prefers free software projects; that includes a preference for projects without bifurcated code bases between differently licensed free and enterprise editions. We welcome and support the MariaDB Foundation as a not-for-profit steward of the free and open MySQL related database community.

Preparing For Change

Major version upgrades of a production database are not to be made lightly. In fact, as late as 2011, some Wikipedia languages were still running a heavily patched version of MySQL 4.0 — the migration to 5.1 required both schema changes, and direct modifications of data dumps to alter the padding of binary-typed columns. MySQL 5.5 contains a variety of incompatibilities with prior versions, thanks in part to better compliance with SQL standards. Changes to the query optimizer between versions may also change the execution plan for common queries, sometimes for the better but historically, sometimes not. SQL behavior changes may result in replication breakage or data consistency issues, while performance regressions, whether from query plan or other changes, can cause site outages. This calls for a lot of testing.

Compatibility testing was accomplished by running MariaDB replicas outside of production, watching for replication errors, replaying production read queries and validating results. After identifying and fixing a couple of MediaWiki issues that surfaced as replication errors (along the lines of trying to set unsigned integer types to negative values which previously caused a wrap-around instead of an error) we replayed production read queries using pt-upgrade from Percona Toolkit. Pt-upgrade replays a query log against two servers, and compares the responses for variances or errors. Scripts originally developed for our recent datacenter migration to simultaneously warmup many standby databases from current production read traffic helped with rough load testing and benchmarking. Along the way, a pair of bugs in MariaDB 5.5.28 and 5.5.29 were identified, one of which was a rare but potentially severe performance regression related to a new query optimizer feature. The MariaDB team was very responsive and quick to offer solutions, complete with test cases.

Performance Testing In Production

As a read-heavy site, Wikipedia aggressively uses edge caching. Approximately 90% of pageviews are served entirely from the edge while at the application layer, we utilize both memcached and redis in addition to MySQL. Despite that, the MySQL databases serving English Wikipedia alone reach a daily peak of ~50k queries/second. Most are read queries served by load-balanced slaves, depending on consistency requirements. 80% of the English Wikipedia query load (up to 40k qps) are typically handled by just two database servers at any given time. Our most common query type (40% of all) has a median execution time of ~0.2ms and a 95th percentile time of ~50ms. To successfully use MariaDB in production, we need it to keep up with the level of performance obtained from Facebook’s MySQL fork, and to behave consistently as traffic patterns change.

Ishmael views of pt-query-digest data collected via tcpdump for the most common Wikipedia read queries (pdf). The first page of a query shows data from db1042, running mysql-facebook-r3753, the second from db1043 over the same time period, running MariaDB 5.5.30.

Ishmael views of pt-query-digest data collected via tcpdump for the most common Wikipedia read queries (pdf). The first page of a query shows data from db1042, running 5.1fb-r3753, the second from db1043 over the same time period, running MariaDB 5.5.30.

Once confident that application compatibility issues were solved and comfortable with performance obtained under benchmark conditions, it was time to test in production. One of the production read slaves from the English Wikipedia shard was taken out of rotation, upgraded to MariaDB 5.5.30, and then returned for warmup. The load balancer weight was then gradually increased until it and a server still running MySQL 5.1-facebook-r3753 were equally weighted and receiving most of the query load.

Also from the Percona Toolkit, we use pt-query-digest across all database servers to collect query performance data which is then stored in a centralized database. Query data is collected from two sources per server and stored in separate buckets — from the slow query which only captures queries exceeding 450ms, and from periodic brief sampling of all queries obtained by tcpdump. Ishmael provides a convenient way to visualize and inspect query digest data over time. Using it, along with direct analysis of the raw data, allowed us to validate that every query continued to perform within acceptable bounds.

For our most common query type, 95th percentile times over an 8-hour period dropped from 56ms to 43ms and the average from 15.4ms to 12.7ms. 50th percentile times remained a bit better with the 5.1-facebook build over the sample period, 0.185ms vs. 0.194ms. Many query types were 4-15% faster with MariaDB 5.5.30 under production load, a few were 5% slower, and nothing appeared aberrant beyond those bounds.

From there, we upgraded the remaining slaves one by one, before finally rotating in a newer upgraded class of servers to act as masters. The switch was seamless and performance continues to look good. We’ll be completing the migration of shards covering the rest of our projects over the next month. Beyond that, we’re looking forward to the future release of MariaDB 10 (global transaction IDs!), and are continually assessing ways to improve our data storage infrastructure. If you’re interested in helping, the Wikimedia Foundation is hiring!

Asher Feldman, Site Architect

Wikimedia Highlights, March 2013

Information For versions in other languages, please check the wiki version of this report, or add your own translation there!

Wikimedia Foundation highlights

Lua speeds up pages and empowers Wikimedia’s technical contributors

On March 13, Lua was enabled for templates on all Wikimedia wikis. The existing syntax for wikitext templates is complicated and limited: it does not offer loops, for example. With Lua, editors can now use a real programming language, in which they can also contribute to programming projects outside Wikimedia. For Wikimedia wikis, Lua means a big performance gain in widely used templates, such as citations. For example, 300 citations on an English Wikipedia article now render in 3 seconds instead of 18 seconds.

The new image upload button in an article on the mobile version of the English Wikipedia

Mobile uploads launch for apps and the mobile web

On the mobile version of Wikipedia, smartphone users can now easily upload a lead image to Wikipedia articles that lack one. Also in March, the Mobile team released a dedicated app for Wikimedia Commons, allowing media uploads from Android and iOS devices.

First Individual Engagement Grants awarded to innovative community projects

The recipients of the first Individual Engagements Grants were announced on March 29. These grants fund projects by individuals or small teams for a duration of six months. Among the largest of the eight funded grants are “The Wikipedia Library” ($7500), which aims to give editors access to reliable sources, donated by publishers, “The Wikipedia Adventure” ($10,000), an on-wiki game for new editors, and a project to collaboratively define a vision for the future of Wikisource (10,000 Euros).

Wikipedia Zero wins award, reaches new users

Wikipedia Zero, which gives people around the world mobile access to Wikipedia free of data charges, won the 2013 SXSW Interactive “Activism” award, beating four other finalists. Also in March, Wikipedia Zero became available to more than 55 million additional subscribers in Russia, as part of a partnership with Beeline (VimpelCom). This was the biggest launch for the Wikipedia Zero team to date. The same month, a new Wikipedia Zero partnership with Axiata Group was announced, which will expand the program in Indonesia, Malaysia, Cambodia, Sri Lanka and Bangladesh this year.

Four of the Ombudsmen during their visit at the WMF office

Ombudsmen meet, might expand mandate

In March, the Foundation’s LCA team hosted five out of seven members of the Ombudsmen Commission in San Francisco, where these community members from around the world met with each other in person for the first time. They consulted with various WMF departments and provided input regarding privacy topics and the work of administrators. Formed in 2006, the Ombudsmen Commission is currently tasked with investigating complaints of alleged Privacy policy violations on behalf of the Board of Trustees. It has been proposed that the Commission should also be allowed to handle complaints about the global CheckUser policy and Oversight policy. An RfC (request for comment) about this is being prepared.

Data and Trends

Global unique visitors for February:

(more…)