Wikimedia blog

News from the Wikimedia Foundation and about the Wikimedia movement

WikiProject Report: Indigenous Peoples of North America

A Zuni girl with a pottery jar on her head, photographed in 1909. Most Zuni live in Zuni Pueblo in southern New Mexico.

Wikipedia’s community-written newsletter, The Signpost, recently talked to a number of participants in WikiProject Indigenous Peoples of North America. Encompassing more than 7,000 articles, the project currently boasts sixteen featured articles—articles that have gone through a thorough vetting process and are considered some of the best on the encyclopedia—as well as 63 WikiProject good articles, which have been through a similar, though less rigorous, process. The WikiProject aims to improve and maintain overall coverage of the indigenous peoples of North America on Wikipedia.

Members CJLippert, Djembayz, RadioKAOS, Maunus and Montanabw were asked for their thoughts on various aspects of the project. All five have a strong interest in the topic, though not all have direct ties to the indigenous peoples of North America. CJLippert, who works for the Mille Lacs Band of Ojibwe, a federally recognized American Indian tribe in Minnesota, comes pretty close. “Minnesota is a cross-road of where the Indian Removal Policy ended and Reservation Policy began and where the old and small Reserve system and the new and large Reservation system intersect,” he explains.

He adds, “As I work for a Native American tribal government, though not Native but also not ‘White’, I have the privilege of participating as the third party between the two. This also means I get to see both the strengths and weaknesses of both in regards to the relations between the Native Americans and the majority population. As that third party, trying to help to close some gaps in understanding is what led me to participate in Wikipedia and then to join the WikiProject.”

Maunus, a linguist and anthropologist, focuses on Mexican indigenous groups, which he feels is an underrepresented topic area on Wikipedia. “I am one of the only people doing dedicated work on these groups, but I have been focusing on languages and I agree that Mexican indigenous people require improved coverage compared to their Northern neighbors,” he says. “There are some articles on the Spanish Wikipedia of very high quality, mainly because of the work of one editor, but likewise other articles that are of very poor quality, with either romanticizing or discriminatory undertones. They also tend to use very low quality sources.”

First Look at the Content Translation tool

The projects in the Wikimedia universe can be accessed and used in a large number of languages from around the world. The Wikimedia websites, their MediaWiki software (bot core and extensions) and their growing content benefit from standards-driven internationalization and localization engineering that makes the sites easy to use in every language across diverse platforms, both desktop and and mobile.

However, a wide disparity exists in the numbers of articles across language wikis. The article count across Wikipedias in different languages is an often cited example. As the Wikimedia Foundation focuses on the larger mission of enabling editor engagement around the globe, the Wikimedia Language Engineering team has been working on a content translation tool that can greatly facilitate the process of article creation by new editors.

About the Tool


The Content Translation editor displaying a translation of the article for Aeroplane from Spanish to Catalan.

Particularly aimed at users fluent in two or more languages, the Content Translation tool has been in development since the beginning of 2014. It will provide a combination of editing and translation tools that can be used by multilingual users to bootstrap articles in a new language by translating an existing article from another language. The Content Translation tool has been designed to address basic templates, references and links found in Wikipedia articles.

Development of this tool has involved significant research and evaluation by the engineering team to handle elements like sentence segmentation, machine translation, rich-text editing, user interface design and scalable backend architecture. The first milestone for the tool’s rollout this month includes a comprehensive editor, limited capabilities in areas of machine translation, link and reference adaptation and dictionary support.

Why Spanish and Catalan as the first language pair?

Presently deployed at http://es.wikipedia.beta.wmflabs.org/wiki/Especial:ContentTranslation, the tool is open for wider testing and user feedback. Users will have to create an account on this wiki and log in to use the tool. For the current release, machine translation can only be used to translate articles between Spanish and Catalan. This language pair was chosen for their linguistic similarity as well as availability of well-supported language aids like dictionaries and machine translation. Driven by a passionate community of contributors, the Catalan Wikipedia is an ideal medium sized project for testing and feedback. We also hope to enhance the aided translation capabilities of the tool by generating parallel corpora of text from within the tool.

To view Content Translation in action, please follow the link to this instance and make the following selections:

  • article name – the article you would like to translate
  • source language – the language in which the article you wish to translate exists (restricted to Spanish at this moment)
  • target language – the language in which you would like to translate the article (restricted to Catalan at this moment)

This will lead you to the editing interface where you can provide a title for the page, translate the different sections of the article and then publish the page in your user namespace in the same wiki. This newly created page will have to be copied over to the Wikipedia in the target language that you had earlier selected.

Users in languages other than Spanish and Catalan can also view the functionality of the tool by making a few tweaks.

We care about your feedback

Please provide us your feedback on this page on the Catalan Wikipedia or at this topic on the project’s talk page. We will attempt to respond as soon as possible based on criticality of issues surfaced.

Runa Bhattacharjee, Outreach and QA coordinator, Language Engineering, Wikimedia Foundation

Coding da Vinci: Results of the first German Culture Hackathon

Mnemosyne, goddess of memory

From the Delaware Art Museum, Samuel and Mary R. Bancroft Memorial, © public domain via Wikimedia Commons

The weather was almost as hot as it was in Hong Kong one year ago. But whereas on that occasion a time machine had to catapult the audience ten years into the future, at the event held on Sunday, July 6 at the Jewish Museum Berlin, the future had already arrived.

It was not only virtual results that were presented at the award ceremony for the culture hackathon Coding da Vinci in Berlin. Image from Marius Förster © cc-by-sa 3.0

At the final event of the programming competition Coding da Vinci, seventeen projects were presented to both a critical jury and the public audience in a packed room. Five winners emerged, three of whom used datasets from Wikimedia projects. This result signals that the predictions put forward by Dirk Franke in Hong Kong have already become a reality: that in the future more and more apps will use the content of Wikimedia projects and that the undiscerning online user will barely notice where the data actually comes from. There is a clear trend towards providing information in a multimedia-based and entertaining way. That’s the meta level, but the source of the knowledge is still clear: Wikipedia.

The aims of Coding da Vinci

The new project format used by Wikimedia Deutschland (WMDE) for the first time this year ended successfully. Coding da Vinci is a culture hackathon organized by WMDE in strategic partnership with the German Digital Library, the Open Knowledge Foundation Germany and the Service Center Digitization Berlin. Unlike a standard hackathon, the programmers, designers and developers were given ten weeks to turn their ideas into finished apps. Most of the 16 participating cultural institutions had made their digital cultural assets publicly available and reusable under a free license especially for the programming competition. With the public award ceremony on July 6 at the Jewish Museum, we wanted to show not just these cultural institutions but also what “hackers” can do with their cultural data. We hope that this will persuade more cultural institutions to freely license their digitized collections. Already this year, 20 cultural data sets have been made available for use in Wikimedia projects.

Exciting til the very end

It was an exciting event for us four organizers, as we waited with baited breath to see what the community of programmers and developers would produce at the end. Of course, not all the projects were winners. One of the projects that did not emerge as a winner, but that I would nevertheless like to give a special mention, was Mnemosyne – an ambitious website that took the goddess of memory as its patron. We are surely all familiar with those wonderful moments of clarity as we link-hop our way through various Wikipedia pages, so who would say no to being guided through the expanse of associative thought by a polymath as they stroll through a museum?

The polymath as a way of life died out in the end of the 19th century, according to Wikipedia – a fact that the Mnemosyne project seeks to address by using a combination of random algorithms to make finding and leafing through complex archive collections a simpler and more pleasurable activity. In spite of some minor blips during the on-stage presentation, the potential of the cast concrete Mnemosyne was plain to see. Hopefully work will continue on this project and the developers will find a museum association that wants to use Mnemosyne to make their complex collections available for visitors to browse.

The five winners

After two hours of presentations and a one-hour lunch break, the winners were selected in the five categories and were awarded their prizes by the jury.

Out of Competition: The zzZwitscherwecker (chirping alarm clock) really impressed both the audience and the jury. It’s a great solution for anyone who finds it difficult to be an early bird in the morning. That’s because you can only stop the alarm if you’re able to correctly match a bird to its birdsong. You’re sure to be wide awake after such a lively brain game.

Read the rest of this entry »

Wikimedia Foundation offers assistance to Wikipedia editors named in U.S. defamation suit

Since posting, we have learned that Mr. Barry’s attorney has requested to withdraw their complaint without prejudice and their request has been granted by the court. Mr. Barry’s attorney has further indicated that Mr. Barry intends to file an amended complaint some unspecified time in the future.

Wikipedia’s content is not the work of one, ten, or even a thousand people. The information on Wikipedia is the combined product of contributions made by hundreds of thousands of people from all over the world. By volunteering their time and knowledge, these people have helped build Wikipedia into a project that provides information to millions every day.

With many different voices come many different perspectives. Resolving them requires open editorial debate and collaboration with and among the volunteer community of editors and writers. Disagreements about content are settled through this approach on a daily basis. On extremely rare occasions, editorial disputes escalate to litigation.

This past month, four users of English Wikipedia were targeted in a defamation lawsuit brought by Canadian-born musician, businessman, and philanthropist Yank Barry. In the complaint, Mr. Barry claims that the editors, along with 50 unnamed users, have acted in conspiracy to harm his reputation by posting false and damaging statements onto Wikipedia concerning many facets of his life, including his business, philanthropy, music career, and legal history.

However, the specific statements Mr. Barry apparently finds objectionable are on the article’s talk page, rather than in the article itself. The editors included in the lawsuit were named because of their involvement in discussions focused on maintaining the quality of the article, specifically addressing whether certain contentious material was well-sourced enough to be included, and whether inclusion of the material would conform with Wikipedia’s policies on biographies of living persons.

A talk page is not an article. It is not immediately available to the readers of the encyclopedia. Its purpose is not to provide information, but a forum for discussion and editorial review. If users are unable to discuss improvements to an article without fear of legal action, they will be discouraged from partaking in discussion at all. While some individuals may find questions about their past disagreeable and even uncomfortable, discussions about these topics are necessary for establishing accurate and up-to-date information. Without discussion, articles will not improve.

In our opinion, this lawsuit is an effort to try and chill free speech on the Wikimedia projects. Since Wikipedia editors do not carve out facts based on bias or promotion this lawsuit is rooted in a deep misinterpretation of the free-form truth-seeking conversations and analysis that is part of the editorial review process that establishes validity and accuracy of historical and biographical information. As such, we have offered the four named users assistance through our Defense of Contributors policy. Three of the users have accepted our offer and obtained representation through the Cooley law firm. We thank Cooley for its assistance in the vigorous representation of our users. The fourth user is being represented by the California Anti-SLAPP Project and is working closely with the Wikimedia Foundation and Cooley.

Lawsuits against Wikipedia editors are extremely rare — we do not know of of any prior cases where a user has been sued for commenting on a talk page. The Wikipedia community has established a number of dispute resolution procedures and venues to discuss content issues that are available for anyone to use. Most content disputes are resolved through these processes. We are unaware of Mr. Barry taking advantage of these processes to work directly with the editors involved in this lawsuit or the greater Wikipedia community to address these issues.

Wikipedia’s mission is to provide the world with the sum of all human information for free and we will always strongly defend its volunteer editors and their right to free speech.

Michelle Paulson, Legal Counsel

Wikimedia engineering report, June 2014

Major news in June include:

Note: We’re also providing a shorter, simpler and translatable version of this report that does not assume specialized technical knowledge.
Read the rest of this entry »

Creating Safe Spaces

This morning I read an article entitled Ride like a girl. In it, the author describes how being a cyclist in a city is like being a woman: Welcome to being vulnerable to the people around you. Welcome to being the exception, not the rule. Welcome to not being in charge. The analogy may not be a perfect fit, but reading these words made me think of a tweet I favorited several weeks ago when #YesAllWomen was trending. A user who goes by the handle @Saradujour wrote: “If you don’t understand why safe spaces are important, the world is probably one big safe space to you.” As I continue interviewing women who edit Wikipedia and as I read through the latest threads on the Gendergap mailing list, I keep asking myself, “How can a community that values transparency create safe spaces? How can we talk about Wikipedia’s gender gap without alienating dissenting voices and potential allies?”

Ride like a girl?

Wikipedia’s gender gap has been widely publicized and documented both on and off Wiki (and on this blog since 1 February 2011). One of the reasons I was drawn to working on the gender gap as a research project was that, despite the generation of a great deal of conversation, there seem to be very few solutions. It is, what Rittel and Webber would call, a “wicked problem.” Even in the midst of the ongoing work of volunteers who spearhead and contribute to endeavors like WikiProject Women scientists, WikiWomen’s History Month, WikiProject Women’s sport and Meetup/ArtandFeminism (to name only a few), the gender gap is a wicked problem a lot of community members–even those dedicated to the topic–seem tired of discussing.

The Women and Wikipedia IEG project is designed to collect and then provide the Wikimedia community with aggregate qualitative and quantitative data that can be used to assess existing efforts to address the gender gap. This data may also be used to guide the design of future interventions or technology enhancements that seek to address the gap. The data may include but not be limited to:

Digging for Data: How to Research Beyond Wikimetrics

The next virtual meet-up will point out research tools. Join!!

For Learning & Evaluation, Wikimetrics is a powerful tool for pulling data for wiki project user cohorts, such as edit counts, pages created and bytes added or removed. However, you may still have a variety of other questions, for instance:

How many members of WikiProject Medicine have edited a medicine-related article in the past three months?
How many new editors have played The Wikipedia Adventure?
What are the most-viewed and most-edited articles about Women Scientists?

Questions like these and many others regarding the content of Wikimedia projects and the activities of editors and readers can be answered using tools developed by Wikimedians all over the world. These gadgets, based on publicly available data, rely on databases and Application Programming Interfaces (APIs). They are maintained by volunteers and staff within our movement.

On July 16, Jonathan Morgan, research strategist for the Learning and Evaluation team and wiki-research veteran, will begin a three-part series to explore some of the different routes to accessing Wikimedia data. Building off several recent workshops including the Wiki Research Hackathon and a series of Community Data Science Workshops developed at the University of Washington, in Beyond Wikimetrics, Jonathan will guide participants on how to expand their wiki-research capabilities by accessing data directly through these tools.

Read the rest of this entry »

Making Wikimedia Sites faster

Running the fifth largest website in the world brings its own set of challenges. One particularly important issue is the time it takes to render a page in your browser. Nobody likes slow websites, and we know from research that even small delays lead visitors to leave the site. An ongoing concern from both the Operations and Platform teams is to improve the reader experience by making Wikipedia and its sister projects as fast as possible. We ask ourselves questions like: Can we make Wikipedia 20% faster on half the planet?

As you can imagine, the end-user experience differs greatly due to our unique diverse and global readership. Hence, we need to conduct real user monitoring to truly get an understanding of how fast our projects are in real-life situations.

But how do we measure how fast a webpage loads? Last year, we started building instrumentation to collect anonymous timing data from real users, through a MediaWiki extension called NavigationTiming.[1]

There are many factors that determine how fast a page loads, but here we will focus on the effects of network latency on page speed. Latency is the time it takes for a packet to travel from the originating server to the client who made the request.

ULSFO

Earlier this year, our new data center (ULSFO) went fully operational, serving content to Oceania, South-East Asia, and the west coast of North America[2]. The main benefit of this work is shaving up to 70−80ms of round-trip time for some regions of Oceania, East Asia, US and Canada. An area with 360 million Internet users and a total population of approximately one billion people.

We recently explained how we chose which areas to serve from the new data center; knowing the sites became faster for those users was not enough for us, we wanted to know how much faster.

Results

Before we talk about specific results, it is important to understand that having faster network round trip times might not directly result in a faster user experience for users. When network times are faster, resources are retrieved faster, but there are many other factors that influence page latency. This is perhaps better explained with an example: If we need 4 network trips to compose a page, and if round trips 2, 3 and 4 are happening while I am parsing a huge main document (round trip 1), I will only see improvements from the first request. Subsequent ones are done in parallel and totally hidden under the fetching of the first one. In this scenario, our bottleneck for performance is the parsing of the first resource. Not the network time.

With that in mind, what we wanted to know when we analyzed the data from the NavigationTiming extension were two things: How much did our network times improve? and Can users feel the effect of faster network times? Are pages perceived to be faster, and if so, how much?

The data we harvest from the NavigationTiming extension is segregated by country. Thus we concentrated our data analysis on countries in Asia for which we had sufficient data points; we also included the United States and Canada but we were not able to extract data just for the western states. Data for United States and Canada was analyzed at a country level and thus the improvements in latency appear “muffled”.

How much did our network times improve?

The short summary is: network times improved quite a bit. For half of requests, the retrieval of the main document decreased up to 70 ms.

ULSFO Improvement of Network times on Wikimedia Sites

In the opposite graph, the data center rollout is marked with a dashed line. The rollout was gradual, thus gains are not perceived immediately but they are very significant after a few days. The graph includes data for Japan, Korea and the whole SE Asia Region.[3]

We graphed the responseStart–connectStart time which represents the time spent in the network until the first byte arrives, minus the time spent in DNS lookups. For a more visual explanation, take a look at the Navigation timing diagram. If there is a TCP connection drop, the time will include the setup of the new connection. All the data we use to measure network improvements is provided by request timing API, and thus not available on IE8 and below.

User perceived latency

Did the improvement of network times have an impact that our users could see? Well, yes it did. More so for some users than others.

The gains in Japan and Indonesia were remarkable, page load times dropped up to 300ms at the 50th percentile (weekly). We saw smaller (but measurable) improvements of 40 ms in the US too. However, we were not able to measure the impact in Canada.

The dataset we used to measure these improvements is a bigger one than the one we had for network times. As we mentioned before, the Navigation Timing API is not present in old browsers, thus we cannot measure, say, network improvement in IE7. In this case, however, we used a measure of our creation that tells us when a page is done loading called mediaWikiLoadComplete. This measure is taken in all browsers when the page is ready to interact with the user; faster times do mean that the user experience was also faster. Now, how users perceive the improvement has a lot to do with how fast pages were to start with. If a page now takes 700 ms to render instead of one second, any user will be able to see the difference. However a difference of 300 ms in a 4 second page rendering will be unnoticed by most.

Reduction in latency

Want to know more?

Want to know all the details? A (very) detailed report of the performance impact of the ULSFO rollout is available.

Next steps

Improving speed is an ongoing concern, particularly as we roll out new features and we want to make sure that page rendering remains fast. We are keeping our eyes open to new ways of reducing latency, for example by evaluating TCP Fast Open. TCP Fast Open skips an entire round-trip and starts sending data from the server to client before the final acknowledgment of the three-way TCP handshake has been finished.

We are also getting closer to deploying HipHop. HipHop is a virtual machine that compiles PHP bytecode to native instructions at runtime, the same strategy used by Java and C# to achieve their speed advantages. We’re quite confident that this will result in big performance improvements on our sites as well.

We wish you speedy times!

Faidon Liambotis
Ori Livneh
Nuria Ruiz
Diederik van Liere

Notes

  1. The NavigationTiming extension is built on top of the HTML5 component with same name which exposes fine-grained measurements from the moment a user submits a request to load a page until the page has been fully loaded.
  2. Countries and provinces served by ULSFO include: Bangladesh, Bhutan, Hong Kong, Indonesia, Japan, Cambodia, Democratic People’s Republic of Korea, Republic of Korea, Myanmar, Mongolia, Macao, Malaysia, Philippines, Singapore, Thailand, Taiwan, Vietnam, US Pacific/West Coast states (Alaska, Arizona, California, Colorado, Hawaii, Idaho, Montana, New Mexico, Nevada, Oregon, Utah, Washington, Wyoming) and Canada’s western territories (Alberta, British Columbia, Northwest Territories, Yukon Territory).
  3. Countries include: Bangladesh, Bhutan, Hong Kong, Indonesia, Japan, Cambodia, Democratic People’s Republic of Korea, Republic of Korea, Myanmar, Mongolia, Macao, Malaysia, Philippines, Singapore, Thailand, Taiwan, Vietnam.

A Survey of Esperanto Wikipedians

Esperanto Wikipedia founder Chuck Smith (right) being interviewed along with Miroslav Malovec (left), Esperanto Wikipedian and founding member of Czech Wikipedia during Esperanto Wikimania 2011 in Svitavy, Czech Republic.

Esperanto Wikipedia started its journey in 2001. Within the past thirteen years Esperanto Wikipedia has registered a massive growth with a record 196,923 articles[1] – an editing landmark which places it in the forefront of not only other constructed languages but also many natural languages.

As a constructive language, Esperanto has been adopted out of love for its inherently uniform grammar as well as the idea of a culturally neutral universal language. In the context of Esperanto Wikipedia, we find people from different parts of the world enthusiastically contributing.

It is with this global framework in mind that I made a humble effort to survey Esperanto Wikipedians in an effort to get an overview of the editing culture of this Wikipedia and the whereabouts of its contributors. I designed a 10-point questionnaire and sent it to a number of Esperanto Wikipedians. To my good fortune, the first recipient, Christian Bertin, provided an Esperanto version of the questionnaire making it possible to give respondents the option to reply either in Esperanto or English. I received 12 responses from Esperanto Wikipedians including three responses from admins.

Although all respondents did not disclose their geographical location, those who did hailed from Austria, Colombia, France, Portugal, Czech Republic, Spain and the United States. Three respondents were inspired by the efforts of the founding member of Esperanto Wikipedia, Chuck Smith, to promote Esperanto Wikipedia, with one respondent, Miroslav Malovec, sharing a close association with him.

Esperanto Wikipedians contribute on a wide array of diverse topics – from local information, transportation, writers, sports, literature, film, food, Esperanto events, Russian ethnography, American culture, geography, ornithology and other areas of biology and science. Two of the respondents, Pino and Miroslav Malovec, made a point to mention outreach events for Esperanto Wikipedia. Miroslav Malovec listed Esperanto Wikimania 2011 in Svitavy, Czech Republic, Conference on the Application of Esperanto in Science and Technology 2012 in Modra, Slovakia and Wikitrans 2013 in Partizánske, Slovakia as examples of outreach events. Another Wikipedian, Marcos Crammer, mentioned an interesting anecdote where he attended the Conference on the Application of Esperanto in Science and Technology 2010 in Modra, Slovakia where he encountered an Esperantist who had been frustrated by Esperanto Wikipedia because of a terminological error which remained unaddressed. Marcos suggested a proposal to change the terminology and later carried out the change, providing his Esperantist friend with a more positive outlook.

Read the rest of this entry »

Pywikibot will have its next bug triage on July 24−27

For most Wikimedia projects, Pywikibot (formerly pywikipedia) has proved to be a trusted and powerful tool. Literally millions of edits have been made by “bots” through this (semi-)automated software suite, written in Python.

Bug triage is like a check-up for bots: we check the list of things that need to be done and clean up the list. During a bug triage, we go through the list of our open bugs and check them for reproducibility (Can we make them happen on our computer to investigate them), severity, priority, and we categorize them when necessary. Bugs in this context can imply a problem in scripts, or a feature request that improves Pywikibot.

From July 24 to July 27, we’ll be holding a big online event to learn what more needs to be done for Pywikibot. Which bugs need an urgent fix, what features are missing or incomplete, etc. Obviously, it is a also a good opportunity to look at the code and look for “bit rot”.

Fixing bugs can sometimes be hard and time-consuming, but bug triaging doesn’t require deep technical knowledge: anyone with a little experience about running bots can be of great help in the bug triage. Triage can be a tedious task due to the number of bugs involved, so we need your help to go through them all.

If you know your Python and are interested in putting your skills to good use to support Wikimedia sites, join us for the bug-a-thon starting July 24. Until then, you can start familiarizing yourself with Pywikibot and bug triaging!

Amir Sarabadani (User:Ladsgroup), editor on the Persian Wikipedia and Pywikibot developer