Wikimedia blog

News from the Wikimedia Foundation and about the Wikimedia movement

Wikimedia Foundation offers assistance to Wikipedia editors named in U.S. defamation suit

Since posting, we have learned that Mr. Barry’s attorney has requested to withdraw their complaint without prejudice and their request has been granted by the court. Mr. Barry’s attorney has further indicated that Mr. Barry intends to file an amended complaint some unspecified time in the future.

Wikipedia’s content is not the work of one, ten, or even a thousand people. The information on Wikipedia is the combined product of contributions made by hundreds of thousands of people from all over the world. By volunteering their time and knowledge, these people have helped build Wikipedia into a project that provides information to millions every day.

With many different voices come many different perspectives. Resolving them requires open editorial debate and collaboration with and among the volunteer community of editors and writers. Disagreements about content are settled through this approach on a daily basis. On extremely rare occasions, editorial disputes escalate to litigation.

This past month, four users of English Wikipedia were targeted in a defamation lawsuit brought by Canadian-born musician, businessman, and philanthropist Yank Barry. In the complaint, Mr. Barry claims that the editors, along with 50 unnamed users, have acted in conspiracy to harm his reputation by posting false and damaging statements onto Wikipedia concerning many facets of his life, including his business, philanthropy, music career, and legal history.

However, the specific statements Mr. Barry apparently finds objectionable are on the article’s talk page, rather than in the article itself. The editors included in the lawsuit were named because of their involvement in discussions focused on maintaining the quality of the article, specifically addressing whether certain contentious material was well-sourced enough to be included, and whether inclusion of the material would conform with Wikipedia’s policies on biographies of living persons.

A talk page is not an article. It is not immediately available to the readers of the encyclopedia. Its purpose is not to provide information, but a forum for discussion and editorial review. If users are unable to discuss improvements to an article without fear of legal action, they will be discouraged from partaking in discussion at all. While some individuals may find questions about their past disagreeable and even uncomfortable, discussions about these topics are necessary for establishing accurate and up-to-date information. Without discussion, articles will not improve.

In our opinion, this lawsuit is an effort to try and chill free speech on the Wikimedia projects. Since Wikipedia editors do not carve out facts based on bias or promotion this lawsuit is rooted in a deep misinterpretation of the free-form truth-seeking conversations and analysis that is part of the editorial review process that establishes validity and accuracy of historical and biographical information. As such, we have offered the four named users assistance through our Defense of Contributors policy. Three of the users have accepted our offer and obtained representation through the Cooley law firm. We thank Cooley for its assistance in the vigorous representation of our users. The fourth user is being represented by the California Anti-SLAPP Project and is working closely with the Wikimedia Foundation and Cooley.

Lawsuits against Wikipedia editors are extremely rare — we do not know of of any prior cases where a user has been sued for commenting on a talk page. The Wikipedia community has established a number of dispute resolution procedures and venues to discuss content issues that are available for anyone to use. Most content disputes are resolved through these processes. We are unaware of Mr. Barry taking advantage of these processes to work directly with the editors involved in this lawsuit or the greater Wikipedia community to address these issues.

Wikipedia’s mission is to provide the world with the sum of all human information for free and we will always strongly defend its volunteer editors and their right to free speech.

Michelle Paulson, Legal Counsel

Wikimedia engineering report, June 2014

Major news in June include:

Note: We’re also providing a shorter, simpler and translatable version of this report that does not assume specialized technical knowledge.
Read the rest of this entry »

Creating Safe Spaces

This morning I read an article entitled Ride like a girl. In it, the author describes how being a cyclist in a city is like being a woman: Welcome to being vulnerable to the people around you. Welcome to being the exception, not the rule. Welcome to not being in charge. The analogy may not be a perfect fit, but reading these words made me think of a tweet I favorited several weeks ago when #YesAllWomen was trending. A user who goes by the handle @Saradujour wrote: “If you don’t understand why safe spaces are important, the world is probably one big safe space to you.” As I continue interviewing women who edit Wikipedia and as I read through the latest threads on the Gendergap mailing list, I keep asking myself, “How can a community that values transparency create safe spaces? How can we talk about Wikipedia’s gender gap without alienating dissenting voices and potential allies?”

Ride like a girl?

Wikipedia’s gender gap has been widely publicized and documented both on and off Wiki (and on this blog since 1 February 2011). One of the reasons I was drawn to working on the gender gap as a research project was that, despite the generation of a great deal of conversation, there seem to be very few solutions. It is, what Rittel and Webber would call, a “wicked problem.” Even in the midst of the ongoing work of volunteers who spearhead and contribute to endeavors like WikiProject Women scientists, WikiWomen’s History Month, WikiProject Women’s sport and Meetup/ArtandFeminism (to name only a few), the gender gap is a wicked problem a lot of community members–even those dedicated to the topic–seem tired of discussing.

The Women and Wikipedia IEG project is designed to collect and then provide the Wikimedia community with aggregate qualitative and quantitative data that can be used to assess existing efforts to address the gender gap. This data may also be used to guide the design of future interventions or technology enhancements that seek to address the gap. The data may include but not be limited to:

Digging for Data: How to Research Beyond Wikimetrics

The next virtual meet-up will point out research tools. Join!!

For Learning & Evaluation, Wikimetrics is a powerful tool for pulling data for wiki project user cohorts, such as edit counts, pages created and bytes added or removed. However, you may still have a variety of other questions, for instance:

How many members of WikiProject Medicine have edited a medicine-related article in the past three months?
How many new editors have played The Wikipedia Adventure?
What are the most-viewed and most-edited articles about Women Scientists?

Questions like these and many others regarding the content of Wikimedia projects and the activities of editors and readers can be answered using tools developed by Wikimedians all over the world. These gadgets, based on publicly available data, rely on databases and Application Programming Interfaces (APIs). They are maintained by volunteers and staff within our movement.

On July 16, Jonathan Morgan, research strategist for the Learning and Evaluation team and wiki-research veteran, will begin a three-part series to explore some of the different routes to accessing Wikimedia data. Building off several recent workshops including the Wiki Research Hackathon and a series of Community Data Science Workshops developed at the University of Washington, in Beyond Wikimetrics, Jonathan will guide participants on how to expand their wiki-research capabilities by accessing data directly through these tools.

Read the rest of this entry »

Making Wikimedia Sites faster

Running the fifth largest website in the world brings its own set of challenges. One particularly important issue is the time it takes to render a page in your browser. Nobody likes slow websites, and we know from research that even small delays lead visitors to leave the site. An ongoing concern from both the Operations and Platform teams is to improve the reader experience by making Wikipedia and its sister projects as fast as possible. We ask ourselves questions like: Can we make Wikipedia 20% faster on half the planet?

As you can imagine, the end-user experience differs greatly due to our unique diverse and global readership. Hence, we need to conduct real user monitoring to truly get an understanding of how fast our projects are in real-life situations.

But how do we measure how fast a webpage loads? Last year, we started building instrumentation to collect anonymous timing data from real users, through a MediaWiki extension called NavigationTiming.[1]

There are many factors that determine how fast a page loads, but here we will focus on the effects of network latency on page speed. Latency is the time it takes for a packet to travel from the originating server to the client who made the request.

ULSFO

Earlier this year, our new data center (ULSFO) went fully operational, serving content to Oceania, South-East Asia, and the west coast of North America[2]. The main benefit of this work is shaving up to 70−80ms of round-trip time for some regions of Oceania, East Asia, US and Canada. An area with 360 million Internet users and a total population of approximately one billion people.

We recently explained how we chose which areas to serve from the new data center; knowing the sites became faster for those users was not enough for us, we wanted to know how much faster.

Results

Before we talk about specific results, it is important to understand that having faster network round trip times might not directly result in a faster user experience for users. When network times are faster, resources are retrieved faster, but there are many other factors that influence page latency. This is perhaps better explained with an example: If we need 4 network trips to compose a page, and if round trips 2, 3 and 4 are happening while I am parsing a huge main document (round trip 1), I will only see improvements from the first request. Subsequent ones are done in parallel and totally hidden under the fetching of the first one. In this scenario, our bottleneck for performance is the parsing of the first resource. Not the network time.

With that in mind, what we wanted to know when we analyzed the data from the NavigationTiming extension were two things: How much did our network times improve? and Can users feel the effect of faster network times? Are pages perceived to be faster, and if so, how much?

The data we harvest from the NavigationTiming extension is segregated by country. Thus we concentrated our data analysis on countries in Asia for which we had sufficient data points; we also included the United States and Canada but we were not able to extract data just for the western states. Data for United States and Canada was analyzed at a country level and thus the improvements in latency appear “muffled”.

How much did our network times improve?

The short summary is: network times improved quite a bit. For half of requests, the retrieval of the main document decreased up to 70 ms.

ULSFO Improvement of Network times on Wikimedia Sites

In the opposite graph, the data center rollout is marked with a dashed line. The rollout was gradual, thus gains are not perceived immediately but they are very significant after a few days. The graph includes data for Japan, Korea and the whole SE Asia Region.[3]

We graphed the responseStart–connectStart time which represents the time spent in the network until the first byte arrives, minus the time spent in DNS lookups. For a more visual explanation, take a look at the Navigation timing diagram. If there is a TCP connection drop, the time will include the setup of the new connection. All the data we use to measure network improvements is provided by request timing API, and thus not available on IE8 and below.

User perceived latency

Did the improvement of network times have an impact that our users could see? Well, yes it did. More so for some users than others.

The gains in Japan and Indonesia were remarkable, page load times dropped up to 300ms at the 50th percentile (weekly). We saw smaller (but measurable) improvements of 40 ms in the US too. However, we were not able to measure the impact in Canada.

The dataset we used to measure these improvements is a bigger one than the one we had for network times. As we mentioned before, the Navigation Timing API is not present in old browsers, thus we cannot measure, say, network improvement in IE7. In this case, however, we used a measure of our creation that tells us when a page is done loading called mediaWikiLoadComplete. This measure is taken in all browsers when the page is ready to interact with the user; faster times do mean that the user experience was also faster. Now, how users perceive the improvement has a lot to do with how fast pages were to start with. If a page now takes 700 ms to render instead of one second, any user will be able to see the difference. However a difference of 300 ms in a 4 second page rendering will be unnoticed by most.

Reduction in latency

Want to know more?

Want to know all the details? A (very) detailed report of the performance impact of the ULSFO rollout is available.

Next steps

Improving speed is an ongoing concern, particularly as we roll out new features and we want to make sure that page rendering remains fast. We are keeping our eyes open to new ways of reducing latency, for example by evaluating TCP Fast Open. TCP Fast Open skips an entire round-trip and starts sending data from the server to client before the final acknowledgment of the three-way TCP handshake has been finished.

We are also getting closer to deploying HipHop. HipHop is a virtual machine that compiles PHP bytecode to native instructions at runtime, the same strategy used by Java and C# to achieve their speed advantages. We’re quite confident that this will result in big performance improvements on our sites as well.

We wish you speedy times!

Faidon Liambotis
Ori Livneh
Nuria Ruiz
Diederik van Liere

Notes

  1. The NavigationTiming extension is built on top of the HTML5 component with same name which exposes fine-grained measurements from the moment a user submits a request to load a page until the page has been fully loaded.
  2. Countries and provinces served by ULSFO include: Bangladesh, Bhutan, Hong Kong, Indonesia, Japan, Cambodia, Democratic People’s Republic of Korea, Republic of Korea, Myanmar, Mongolia, Macao, Malaysia, Philippines, Singapore, Thailand, Taiwan, Vietnam, US Pacific/West Coast states (Alaska, Arizona, California, Colorado, Hawaii, Idaho, Montana, New Mexico, Nevada, Oregon, Utah, Washington, Wyoming) and Canada’s western territories (Alberta, British Columbia, Northwest Territories, Yukon Territory).
  3. Countries include: Bangladesh, Bhutan, Hong Kong, Indonesia, Japan, Cambodia, Democratic People’s Republic of Korea, Republic of Korea, Myanmar, Mongolia, Macao, Malaysia, Philippines, Singapore, Thailand, Taiwan, Vietnam.

A Survey of Esperanto Wikipedians

Esperanto Wikipedia founder Chuck Smith (right) being interviewed along with Miroslav Malovec (left), Esperanto Wikipedian and founding member of Czech Wikipedia during Esperanto Wikimania 2011 in Svitavy, Czech Republic.

Esperanto Wikipedia started its journey in 2001. Within the past thirteen years Esperanto Wikipedia has registered a massive growth with a record 196,923 articles[1] – an editing landmark which places it in the forefront of not only other constructed languages but also many natural languages.

As a constructive language, Esperanto has been adopted out of love for its inherently uniform grammar as well as the idea of a culturally neutral universal language. In the context of Esperanto Wikipedia, we find people from different parts of the world enthusiastically contributing.

It is with this global framework in mind that I made a humble effort to survey Esperanto Wikipedians in an effort to get an overview of the editing culture of this Wikipedia and the whereabouts of its contributors. I designed a 10-point questionnaire and sent it to a number of Esperanto Wikipedians. To my good fortune, the first recipient, Christian Bertin, provided an Esperanto version of the questionnaire making it possible to give respondents the option to reply either in Esperanto or English. I received 12 responses from Esperanto Wikipedians including three responses from admins.

Although all respondents did not disclose their geographical location, those who did hailed from Austria, Colombia, France, Portugal, Czech Republic, Spain and the United States. Three respondents were inspired by the efforts of the founding member of Esperanto Wikipedia, Chuck Smith, to promote Esperanto Wikipedia, with one respondent, Miroslav Malovec, sharing a close association with him.

Esperanto Wikipedians contribute on a wide array of diverse topics – from local information, transportation, writers, sports, literature, film, food, Esperanto events, Russian ethnography, American culture, geography, ornithology and other areas of biology and science. Two of the respondents, Pino and Miroslav Malovec, made a point to mention outreach events for Esperanto Wikipedia. Miroslav Malovec listed Esperanto Wikimania 2011 in Svitavy, Czech Republic, Conference on the Application of Esperanto in Science and Technology 2012 in Modra, Slovakia and Wikitrans 2013 in Partizánske, Slovakia as examples of outreach events. Another Wikipedian, Marcos Crammer, mentioned an interesting anecdote where he attended the Conference on the Application of Esperanto in Science and Technology 2010 in Modra, Slovakia where he encountered an Esperantist who had been frustrated by Esperanto Wikipedia because of a terminological error which remained unaddressed. Marcos suggested a proposal to change the terminology and later carried out the change, providing his Esperantist friend with a more positive outlook.

Read the rest of this entry »

Pywikibot will have its next bug triage on July 24−27

For most Wikimedia projects, Pywikibot (formerly pywikipedia) has proved to be a trusted and powerful tool. Literally millions of edits have been made by “bots” through this (semi-)automated software suite, written in Python.

Bug triage is like a check-up for bots: we check the list of things that need to be done and clean up the list. During a bug triage, we go through the list of our open bugs and check them for reproducibility (Can we make them happen on our computer to investigate them), severity, priority, and we categorize them when necessary. Bugs in this context can imply a problem in scripts, or a feature request that improves Pywikibot.

From July 24 to July 27, we’ll be holding a big online event to learn what more needs to be done for Pywikibot. Which bugs need an urgent fix, what features are missing or incomplete, etc. Obviously, it is a also a good opportunity to look at the code and look for “bit rot”.

Fixing bugs can sometimes be hard and time-consuming, but bug triaging doesn’t require deep technical knowledge: anyone with a little experience about running bots can be of great help in the bug triage. Triage can be a tedious task due to the number of bugs involved, so we need your help to go through them all.

If you know your Python and are interested in putting your skills to good use to support Wikimedia sites, join us for the bug-a-thon starting July 24. Until then, you can start familiarizing yourself with Pywikibot and bug triaging!

Amir Sarabadani (User:Ladsgroup), editor on the Persian Wikipedia and Pywikibot developer

Wikimedia Statement on Copyright Changes in the Trans-Pacific Partnership

Potential members of the Trans-Pacific Partnership.      Currently in negotiations      Announced interest in joining      Potential future members

Today, the Wikimedia Foundation supports the Fair Deal coalition in voicing opposition to certain provisions of the Trans-Pacific Partnership (TPP), a trade agreement that is being secretly negotiated by 12 countries. We have signed onto two letters that focus on two proposals that would be particularly harmful to the Wikimedia movement. The first proposal would extend copyright terms well beyond previously-agreed periods, and the other would expand liability for Internet service providers beyond the standards set out in the U.S. Digital Millenium Copyright Act (DMCA) or in other countries’ copyright laws. As a host of other organizations and technology innovators have pointed out, these proposals have dangerous implications for free knowledge, online privacy, and freedom of expression. The Wikimedia Foundation is particularly compelled to act because these proposals threaten our mission of distributing free and public domain content to all people.

The first provision in question seeks to extend the copyright term in signatory countries far beyond what is required by previous international agreements such as the Berne Convention or the Agreement on Trade-Related Aspects of Intellectual Property Rights (TRIPS). The TPP would extend copyright terms from the lifetime of the creator plus 50 years to lifetime plus 70 years. On average, this means that a work could only enter the public domain after almost 140 years. Although proponents of copyright term extension commonly argue that such restrictive monopoly rights provide an incentive for creators to generate material, economists and legal scholars have found that the benefits of such term extensions accrue overwhelmingly to copyright holding companies rather than to the artists themselves. Extended copyright terms also result in works becoming unavailable altogether – or “orphaned” – because the copyright owner cannot be contacted or is uninterested in commercializing their work. This erosion of the public domain would weaken Wikipedia and all Wikimedia projects that build on a rich public domain.

Read the rest of this entry »

How RIPE Atlas Helped Wikipedia Users

This post by Emile Aben is cross-posted from RIPE Labs, a blog maintained by the Réseaux IP Européens Network Coordination Centre (RIPE NCC). In addition to being the Regional Internet Registry for Europe, the Middle East and parts of Central Asia, the RIPE NCC also operates RIPE Atlas, a global measurement network that collects data on Internet connectivity and reachability to assess the state of the Internet in real time. Wikimedia engineer Faidon Liambotis recently collaborated with the RIPE NCC on a project to measure the delivery of Wikimedia sites to users in Asia and elsewhere using our current infrastructure. Together, they identified ways to decrease latency and improve performance for users around the world. 

During RIPE 67, Faidon Liambotis (Principal Operations Engineer at the Wikimedia Foundation) and I got into a hallway conversation. Long story short: We figured we could do something with RIPE Atlas to decrease latency for users visiting Wikipedia and other Wikimedia sites.

At that time, Wikimedia had two locations active (Ashburn and Amsterdam), and was preparing a third (San Francisco), to better serve users in Oceania, South Asia, and US/Canada west coast regions. We were wondering about the effects on network latency for users world-wide for this third location and Wikimedia wanted to quantify the effect turning up this location would have.

Wikimedia runs their own Content Delivery Network (CDN), mostly for privacy & cost reasons. Like most CDNs, to geographically balance the traffic to their various points of presence (PoPs), they employ a technique called GeoDNS: a user will, based on the DNS request that is made on their behalf from their DNS resolver, be specifically directed to one of the data centers based on their or their resolver’s IP address. This requires the authoritative DNS servers for Wikimedia sites to know where to best direct the user to. Wikimedia uses gdnsd for authoritative DNS to dynamically respond to those queries based on a region-to-datacenter map.

Some call this ‘stupid DNS tricks‘, others find it useful to decrease latency towards websites. Wikimedia is in the latter group, and we used RIPE Atlas to see how this method performs.

One specific question we wanted answered is where to “split Asia” between the San Francisco and the Amsterdam Wikimedia location. Latency is obviously a function of physical distance, but also the choice of upstream networks. As an example, these choices determine if packets to “other side of the world” destinations tend to be routed clockwise or counter-clockwise.

We scheduled latency measurements from all RIPE Atlas probes towards the three Wikimedia locations we wanted to look at, and visualised what datacenter showed the lowest latency for each probe. You can see the results in Figure 1 below.

Screenshot of latency map. Probes are colored based on the datacenter that shows the lowest measured latency for this particular probe

Figure 1: Screenshot of latency map. Probes are colored based on the datacenter that shows the lowest measured latency for this particular probe.

This latency map shows the locations of RIPE Atlas probes, coloured by what Wikimedia data center has the lowest latency measured from that probe:

  • Orange: the Amsterdam PoP has the lowest latency
  • Green: the Ashburn PoP has the lowest latency
  • Blue: the San Francisco PoP has the lowest latency.

Probes where the lowest latency is over 150ms have a red outline. An interactive version of this map is available here. Note that this is a prototype to show the potential of this approach, so it is a little rough around the edges.

Probes located in India clearly have lower latency towards Amsterdam. Probes in China, South Korea, the Philippines, Malaysia and Singapore showed lower latency towards San Francisco. For other locations in South-East Asia the situation was less clear, but that is also useful information to have, because it shows that directing users to either the Amsterdam or the San Francisco data center seems equally good (or bad). It is also interesting to note that all of Russia, including the two most eastern probes in Vladivostok have lowest latency towards Amsterdam. For the Vladivostok probes Amsterdam and San Francisco are almost the same distance, give or take 100 km. Nearby probes in China, South Korea and Japan have lowest latency towards San Francisco.

There is always the question of drawing conclusions based on a low number of samples, and how representative RIPE Atlas probe hosts are for a larger population. Having some data is better then no data in these cases though, and if a region has a low number of probes that can always be fixed by deploying more probes there. If you live in an underrepresented region you can apply for a probe and make this better.

With this measurement data to back it, Wikimedia has gradually turned up Oceania, South Asian countries and US/Canada states where RIPE Atlas measurements showed minimal latency to, to be served by their San Francisco caching PoP. The geo-config that Wikimedia is running on, is publicly available here.

As for the code that created the measurements and created the latency map: This was all prototype-quality code at best, so I originally planned to find a second site where we could do this, so to see if we could generalise scripts and visualisation and then share.

At RIPE 68 there was interest in even this raw prototype code for doing things with data centers, latency and RIPE Atlas, so we ended up sharing this code privately, and have heard of progress made on that already. In the meantime we’ve put up the code that created the latency map on github. Again: it’s a prototype, but if you can’t wait for a better version, please feel free to use and improve it.

Conclusion

If you have an interesting idea, and have no time, or other things are stopping you from implementing it, please let us know! You can always chat with us at a RIPE meetingregional meeting or any other channels. We don’t have infinite time, but we can definitely try out things, especially ideas that will improve the Internet and/or improve the life of network operators.

Emile Aben

Wikipedia Signpost report: WikiProject Film

The logo of WikiProject Film

The English Wikipedia’s community-written newsletter, the Wikipedia Signpost, spoke to five members of WikiProject Film recently. One of the largest projects on the wiki, it boasts around 500 members and has existed for more than ten years. Its goal is to improve and manage articles directly related to film, the project’s main focus area. Though it has a broad scope, currently encompassing more than 100,000 articles, several smaller groups have split away to focus on narrower topics such as actors and screenwriters.

Those interviewed were Corvoe, Erik, Favre1fan93, NinjaRobotPirate and Lugnuts, all veteran editors possessing varying levels of experience. All five are, naturally, fans of film and share the common goal of improving Wikipedia’s coverage of film-related topics.

“Though I’d worked on music articles for a considerable amount of time beforehand, I’ve never felt more at home than at WikiProject Film,” says Corvoe, who has been a member of the project for a year. “Its collaborations far outweigh its solo ventures, and there are a large amount of us just wanting to improve as many articles as much as we can. I want others like me to have any amount of information that we can find at their fingertips, so that those as curious as I am have one centralized hub for any films they might be interested in.”
Read the rest of this entry »