Wikimedia blog

News from inside the Wikimedia Foundation.org

Technology

News and information from the Wikimedia Foundation’s Technology department (RSS feed).

If you’re seeing ads on Wikipedia, your computer is probably infected with malware

We never run ads on Wikipedia. Wikipedia is funded by more than a million donors, who give an average donation of less than 30 dollars. We run fundraising appeals, usually at the end of the year. If you’re seeing advertisements for a for-profit industry (see screenshot below for an example) or anything but our fundraiser, then your web browser has likely been infected with malware.

Screenshot of the Wikipedia article on John Slattery, with an advertisement for Inkfruit injected by malware on the user's computer

Malware installed on your computer may inject advertising into a page on popular websites, such as this Wikipedia article. This is an example that we've seen in the wild. Note the tiny text "ads not by this site" immediately below the ad, which may or may not appear next to these types of injected advertisements.

One example that we have seen installs itself as a browser extension. The extension is called “I want this” and installs itself in Google Chrome. To remove it:

  • Open the options menu via the “pipe-wrench” icon on the top right, and choose Settings.
  • Open the Extensions panel and there is the list of extensions installed.
  • Remove an Extension by clicking the Remove button next to an item.

There is likely other similar malware that injects ads into Chrome, Firefox, Internet Explorer and other popular browsers. If you see examples that you can document, please point them out in the comments.

Ads injected in this manner may be confined to some sites, even just to Wikipedia, or they may show up on all sites you visit. Browsing through a secure (HTTPS) connection (which you can automate using the HTTPS everywhere extension) may cause the ads to disappear, but will not fix the underlying problem.

Disabling browser add-ins is a good starting point to determine the source of these types of ads. This does not necessarily fix the source of the problem either, as malware may make deep changes to your operating system. If you’re comfortable attempting a malware scan and removal yourself, there are various spyware/malware removal tools. Popular and well-reviewed solutions include Ad-Aware and Malwarebytes. But be aware that these types of tools may also bundle software, or leave your computer in an unusable state.

If in doubt, have your computer evaluated for malware by a competent and qualified computer repair center.

There is one other reason you might be seeing advertisements: Your Internet provider may be injecting them into web pages. This is most likely the case with Internet cafes or “free” wireless connections. This New York Times blog post by Brian Chen gives an example.

But rest assured: you won’t be seeing legitimate advertisements on Wikipedia.

We’re here to distribute the sum of human knowledge to everyone on the planet — ad-free, forever.

Philippe Beaudette, Director of Community Advocacy
Erik Moeller, Vice President of Engineering and Product Development

New book dives into the architecture of MediaWiki, git, puppet and other open-source applications

The cover of the book, based on the photo of a building from a low-angle shot

The Architecture of Open-Source Applications is a collection of technical essays detailing the architecture of twenty-four major open-source applications.

The second volume of the Architecture of Open-Source Applications book, which includes a chapter on MediaWiki, is now available online and on lulu.com.

The Architecture of Open-Source Applications is a collection of technical essays detailing the architecture of twenty-four major open-source applications. This is the second volume of a series that aims to help developers understand how great and large programs are constructed, and the decisions (or accidents) that led to the way they now work. The series draws inspiration from books used by architects that feature case studies of the great buildings of history.

This volume contains a chapter detailing the inner workings of MediaWiki, the wiki software that powers all Wikimedia sites, including Wikipedia.

The writing of the chapter was coordinated by myself and Sumana Harihareswara. While I put together the majority of the content, it wouldn’t have been possible without the initial knowledge-sharing effort made by many Wikimedia engineers and volunteer MediaWiki developers, who also reviewed and improved the several revisions the text underwent.

The chapter on MediaWiki is available on the book’s website, along with the other chapters from both volumes. Its content was integrated into the documentation on mediawiki.org (at MediaWiki history and Manual:MediaWiki architecture) when it was completed in November 2011.

Greg Wilson and Amy Brown, the book’s editors, contacted the Wikimedia Foundation in August 2011 to offer to feature MediaWiki in the second volume. We chose a very collaborative approach to writing the chapter to ensure that the content was accurate and thorough, and also to split the workload among subject matter experts.

This volume dives into the inner workings of other tools familiar to the Wikimedia community, like Git, GNU Mailman, nginx and Puppet.

All of the book’s content is released under the Creative Commons Attribution license, similar to the license used on Wikimedia sites. It is freely available for reading online at http://www.aosabook.org, and you can also order a print from lulu.com. E-book and PDF versions will be available for purchase shortly. All royalties from purchases are donated to Amnesty International.

This is the second book published this year that contains a chapter written by Wikimedia staff, after the publication of Open Advice, a collection of essays, stories and lessons learned by members of the Free Software community.

I hope the chapter on MediaWiki, and also the rest of the book, will prove useful and interesting to the Wikimedia community and other developers. If you enjoyed it, learned from it, or would like to see more publications of this type, let us know!

Guillaume Paumier
Technical communications manager

DigiCert partnership enhances SSL security on Wikimedia sites

The Wikimedia Foundation today announced a partnership with DigiCert, Inc. based in Linden, Utah, to secure its web and mobile properties, using the company’s Enterprise SSL Managed PKI. The agreement supports online authentication and encryption on Wikimedia’s web and mobile properties, while enabling Foundation staff to streamline digital certificate management.

“The Wikimedia Foundation is grateful for this partnership with DigiCert, which will enhance our ability to secure the millions of online exchanges that occur with our websites each day,” said CT Woo, Director of Technical Operations for the Wikimedia Foundation. “It’s important for Wikimedia to identify like-minded partners that value transparency and the privacy of our users.”

DigiCert is an online security provider for many of the most recognized companies and web sites in the world, including four of the top 10 comScore-ranked sites. With 489 million unique visitors to the 285 language Wikipedias and sister sites each month, the Wikimedia Foundation seeks partners who share its mission to ensure transparency and privacy for its users.

DigiCert has seen consistent growth over time and is currently the world’s third-largest provider of enterprise authentication services and digital certificates, with numerous government, educational and business clients around the world.

“DigiCert is pleased to partner with the Wikimedia Foundation in recognizing the importance of the free and secure flow of information across the Internet and to support the Foundation’s mission,” said DigiCert CEO Nicholas Hales in a press release. “We’re excited to have another opportunity to demonstrate the quality, scalability and flexibility of DigiCert’s products for a continually expanding roster of globally leading organizations of all sizes and industries.”

CT Woo, Director of Technical Operations

Wikimedia engineering April 2012 report

Major news in April include:

Mobile milestone: Two billion page views

Page views to the Wikipedia mobile site (red: non-English versions) compared to the 2 billion target from the annual plan

One of the annual plan targets of the Wikimedia Foundation for 2011-2012 was to reach 2 billion monthly page views to the Wikipedia mobile site by June 2012. We’re happy to say that we hit the mark sooner, on the second-to-last day of April to be exact. April clocked in at 2.089 billion, a year-over-year increase of 187%. The mobile site now attracts 12.6% of all page views for Wikipedia, more than twice of its 5.1% share in April 2011.

How did it happen? As internet usage shifts from a desktop-centric environment to a more mobile-centric one, there’s a migration to smaller screens. Various industries and factors have made that happen, and several things have been done at the Wikimedia Foundation to move with the change. We can’t do justice to all the individual work by attempting to list it here, but amongst the many changes and contributions, a few highlights include the launch of the new mobile site last October, better device detection, and the official Android app announced in January.

Also notable about the 2 billion mark is the way use has evolved globally. A year ago, 67% of all visits to the Wikipedia mobile site were to the English Wikipedia; now that number is 54%. In the Global South in particular, traffic to the mobile sites for certain languages has grown tremendously. Some examples include Portuguese (from 3.9M to 27.4M), Arabic (from 1.7M to 10.2M), and Turkish (from 1.0M to 9.0M). As our partnership programs roll out to allow hundreds of millions to access Wikipedia on their mobile devices without incurring data charges, we expect mobile use to be even more globally distributed over the coming year.

The work on mobile, from both the tech and global development side, is not slowing down in the least however. There’s a lot more to come, but it’s worth taking a moment to recognize the mark we’ve reached, and to thank every community and staff member who played a part.

On behalf of the Mobile Team (Tomasz Finc, Patrick Reilly, Arthur Richards, Jon Robson, André Engels, Kul Wadhwa, Mani Pande, Amit Kapoor, Yuvaraj Pandian, Max Semenik, Phil Chang, Dan Foy):

Amit Kapoor, Senior Manager, Mobile Partnerships

Analyzing Mobile Browser Energy Consumption

Recently, technology reporter Jacob Aron wrote a blog post on newscientist.com that talks about how bloated website code drains your smartphone’s battery.

He mentions how Stanford computer scientist Narendran Thiagarajan and colleagues used an Android phone hooked up to a multimeter to measure the energy used in downloading and rendering popular websites. Using their experimental setup they measured the energy needed to render popular web sites as well as the energy needed to render individual web elements such as images, Javascript, and Cascading Style Sheets (CSS). They claim that complex Javascript and CSS can be as expensive to render as images. Moreover, dynamic Javascript requests (in the form of XMLHttpRequest) can greatly increase the cost of rendering the page, since it prevents the page contents from being cached. Finally, they show that on the Android browser, rendering JPEG images is considerably cheaper than other formats, such as GIF and PNG for comparably sized images.

One example that is cited is that simply loading the mobile version of Wikipedia over a 3G connection consumed just over 1 per cent of the phone’s battery, while browsing apple.com, which does not have a mobile version, used 1.4 per cent.
Yet, in the summary of the paper they find that the results from this study are not meaningful except for the initial loading of just a single page resource. It would be interesting to extend these results in a meaningful way, and study the energy signature of an entire browsing session at a site such as Wikipedia, where a user typically moves from page to page. So, during that session, downloaded web elements such as Javascript, CSS and images would mostly be cached locally. Therefore, we really can’t estimate the energy cost of a total session by simply summing the energy usage of pages visited during that session. Measuring an entire typical session may help optimize the power signature of the entire site. Custom CSS that is applicable to every page of a site would easily outweigh the cost of the apparently excessive CSS download for the render of just the first page.
So, one of the ways that we are looking to improve our mobile browser energy consumption is by implementing the MediaWiki ResourceLoader in order to improve the load times for JavaScript and CSS. ResourceLoader is the delivery system in MediaWiki for the optimized loading and managing of modules. Its purpose is to improve MediaWiki’s front-end performance and the experience by making use of strong caching while still allowing near-instant deployment of new code that all clients start using within 5 minutes. Modules are built of JavaScript, CSS and interface messages; it was first released in MediaWiki 1.17.
On Wikimedia wikis, every page view includes hundreds of kilobytes of JavaScript. In many cases, some or all of this code goes unused due to browser support or because users do not make use of the features on the page. In these cases, bandwidth and loading time spent on downloading, parsing and executing JavaScript code are wasted. This is especially true when users visit MediaWiki sites using older browsers, like Internet Explorer 6, where almost all features are unsupported, and parsing and executing JavaScript is extremely slow.
ResourceLoader solves this problem by loading resources on demand and only for browsers that can run them. Although there is too much to summarize in a simple list, the major improvements for client-side performance are gained by:
  • Minifying and concatenating
  • → which reduces the code’s size and parsing/download time
  • JavaScript files, CSS files and interface messages are loaded in a single special formatted “ResourceLoader Implement” server response.
  • Batch loading
  • → which reduces the number of requests made
  • The server response for module loading supports loading multiple modules so that a single response contains multiple ResourceLoader Implements, which in itself contain the minified and concatenated result of multiple javascript/css files.
  • Data URIs embedding
  • → which further reduces the number of requests, response time and bandwidth
  • Optionally images referenced in stylesheets can be embedded as data URIs. Together with the gzippping of the server response, those embedded images, together, function as a “super sprite”.

Patrick Reilly, Senior Software Developer, Mobile

Wikimedia Foundation selects nine students for summer software projects

We received 63 proposals for this year’s Google Summer of Code, and several mentors put many hours into evaluating project ideas, discussing them with applicants and making the tough decisions. We’re happy to announce our final choices, the Google Summer of Code students for 2012:

MediaWiki logo

All nine of these students are working on MediaWiki, the software that powers Wikimedia sites.

Congratulations to this year’s students, and thanks to all the applicants, as well as MediaWiki’s many mentors, developers who evaluated applications, and Google’s Open Source Programs Office. The accepted students now have a month to ramp up on MediaWiki’s processes and get to know their mentors (the Community Bonding Period) and will start coding their summer projects on or before May 21st. As the organizational administrator for MediaWiki’s GSoC participation, I’ll be keeping an eye on all nine students and helping them out.

Good luck!

Sumana Harihareswara, Volunteer Development Coordinator

Google Summer of Code 2012

Google Summer of Code 2012

Niklas Laxström, language engineer and Wikimedian

University of HelsinkiThe average age of the MediaWiki developers is quite young. They often started contributing to the MediaWiki code while still in school or university. When their contributions show promise, they are sometimes asked to contribute to particular projects. This has resulted in the hiring of students and they continue to do professionally what they at first did as a hobby.

While the Wikimedia Foundation is happy with the talent it gains in this way, it feels strongly that finishing formal education is very important. Some students only work for the WMF in their holidays while others manage regular contributions in their free time as well. Such relations are often strengthened through programs like the Google Summer of Code or through summer internships.

Niklas Laxström recently finished University and this happy occasion is reason enough to interview him. As you may know, he works for the WMF Localisation Team and his claim to fame is that he started what became translatewiki.net. Niklas has been instrumental in much of the internationalisation and localisation development for the MediaWiki software.

Thanks,
Gerard Meijssen
Internationalization / Localization outreach consultant

Congratulations, master Niklas. You finished university !! What did you study and what is your exact title (in Finnish)
I studied language technology with minors in Finnish language, Computer Science, East-Asian studies and collection of Russian language courses. I’m now Master of Arts, filosofian maisteri.

You started with what became translatewiki.net before you started university. How did your study influence the development of translatewiki.net
Before university I had a hobby project for inflecting Finnish nouns. It wasn’t successful nor had it a good design, but it started series of events, which caused me to start studying language technology.

My studies were pretty heavily biased in hard language processing: for instance syntactic parsers, finite state technologies and morphologies.  however, the open source language technologies are not yet in a level where that kind of processing can just be plugged into any software.

Learning about variation in languages has been very useful to me. It helps avoiding solutions that only work for limited number of similar languages. I learned most of that in linguistics courses but also by studing several dissimilar languages. l also liked the isolated courses about copyright, terminologies and string processing, which turned out to be useful in different situations.

On the other hand, working with MediaWiki and translatewiki.net has given me enormous amounts of practical experience all over computer
engineering, which helped me to perform better in engineering related courses.

(more…)

US Education Program participants add three times as much quality content as regular new users

Wikipedia Education Program participants from the United States added more than three times as much quality content as regular new users, a quantitative analysis shows.

In the Wikipedia Education Program, professors assign their students to edit Wikipedia articles as a grade for class, assisted by volunteer Wikipedia Ambassadors. In fall 2011, 55 courses participated in the program in the United States, with students editing articles on the English Wikipedia. On average, these students added 1855 bytes of content that stayed on Wikipedia, compared to only 491 for a randomly chosen sample of new users who joined English Wikipedia in September 2011. These numbers establish that students who participate in the Wikipedia Education Program contribute significantly more quality content that stays on Wikipedia than other new users.

Examining the distribution of content that survived on Wikipedia for both of these groups, we found that almost half of the Wikipedia Education Program participants added 1,000 or more bytes that stayed on Wikipedia in the first six months. In contrast, more than half of the random sample of new editors added no content that stayed on Wikipedia in the first six months. The targeted recruitment of students, combined with the support provided by the Ambassador Program and instructors, results in a much larger percentage of new editors who contribute quality content to Wikipedia.

To understand the collective impact of the Wikipedia Education Program in fall 2011, we compared the amount of content students added to Wikipedia to the content added by the random sample of new editors. The numbers show that the 920 student editors who participated in the program in fall 2011 added the same amount of content as 2250 typical new editors (editors are defined as users who made at least one edit to an article). In terms of new content, students have twice the impact as typical new editors.

An important consideration for any outreach project is editor retention. Data showed that students who are introduced to editing Wikipedia through the U.S. Education Program are just as likely to continue editing as any other newcomer.

The Wikipedia Education Program has now grown to Egypt, Brazil and other regions beyond North America. With an increased global presence, measuring and understanding the contributions of new student editors (and how they differ from other new users that join Wikipedia) has gained importance. Establishing a common metric for measuring the impact of the Wikipedia Education Program on various Wikipedias is another key motivation for a quantitative study.

There’s a lot more work to be done on measuring the program’s impact. So, stay tuned for more information about these metrics.

Methodology for this research can be found at: http://meta.wikimedia.org/wiki/Research:Wikipedia_Education_Program_evaluation#Methods

Ayush Khanna, Data Analyst, Global Development

(with input from Mani Pande, Head of Global Development Research)

Primary data about languages

For MediaWiki, the CLDR or Common Locale Data Repository, is a primary source of information. The information about languages Unicode maintains in this standard is what is most relevant to us. It registers its name in English, as well as the autonym or the name in its own language, as well as information like what a date and a number look like,  the script or scripts used for a language and the names of other languages in that language.

We prefer to use standardised information, not only because it is stable and reliable, but because we do not have to collect the data ourselves and also because the data is used by many other organisations and in many other applications. We love the CLDR and we want it to be even better. To make it better we need your help.

Many of the languages that have a Wikipedia and many of the languages that want to have a Wikipedia are not represented in the CLDR. Many Wikipedians know their language really well. They can provide the information about their language and they can verify that the existing information is correct. When there is a need to change things, you will need to create a user.

When a language is not yet supported, you will have to request for the new locale or language to be added. It is expected that you provide at least the core data when you make your request and that you at least complete the minimal data required. One of the questions is: where the language is official, it may be that a language does not have any official status. This does not prevent people from reading or writing that language and it does not mean that information about such a language is not important to us.

When a language is already supported, we want you to verify if the names for other languages exist and are correctly written. There can be issues in any language including English; using the Auracana name for the Mapundungun language is considered an insult.

When you are able and happy to help us in this way, you may be interested in joining our “language support team.” Because of your interest you belong to the group of people we first want to turn to when we have questions about supporting your language. More structured information and room for your reports can be found here. When there are any issues, do not hesitate to report them.

Thanks,
Gerard Meijssen
Internationalization / Localization outreach consultant