Wikimedia blog

News from the Wikimedia Foundation and about the Wikimedia movement

Posts by Tilman Bayer

Wikimedia Research Newsletter, September 2013

Wikimedia Research Newsletter
Wikimedia Research Newsletter Logo.png


Vol: 3 • Issue: 9 • September 2013 [contribute] [archives] Syndicate the Wikimedia Research Newsletter feed

Automatic detection of “infiltrating” Wikipedia admins; Wiki, or ‘pedia?

With contributions by: Brian Keegan, Piotr Konieczny, Aaron Halfaker, Jonathan Morgan and Tilman Bayer

Wiki, or ‘pedia? The genre and values of Wikipedia compared with other encyclopedias

Wikipedia and Encyclopaedism: A Genre Analysis of Epistemological Values[1] is a new Masters’ Thesis that analyzes the values that influenced how knowledge is presented on Wikipedia, in comparison with other encyclopedias that have been created throughout history. The author uses genre analysis to compare the epistemological values that are represented in the kind of knowledge that different encyclopedias present and in the way they present that knowledge. The author first conducts a literature review to compare the epistemology of two genres: wikis and encyclopedias. The wiki epistemology is composed of six values: self-identification, collaboration, co-construction, cooperation, trust in the community, and constructionism. By contrast, the values of major current and historical encyclopedias—such as Diderot’s Encyclopedia, Pliny’s Natural History, and the Encyclopædia Britannica—prioritize trust in experts, authority, and consistency.

Despite being based on different, and even somewhat contradictory, value systems, the purpose of Wikipedia and the way it presents knowledge are shown to be similar to other works in the encyclopedia genre. The author analyzes the frequency of common words in section headings of 25 heavily edited English Wikipedia articles that had a corresponding article in Britannica. He compares the evolution of section headings within these Wikipedia articles and multiple editions of Britannica, and shows that the gradual process by which a Wikipedia article becomes more structured through the addition and alteration of headings is similar to the process for Britannica articles, which also tend to become longer and more formally structured over subsequent editions. This thesis presents some interesting parallels between the way articles are developed within Wikipedia and other encyclopedias, despite vastly different timescales and some differing underlying values. It also offers an engaging, in-depth discussion of the concept of genre, the purpose of the encyclopedia genre, and the history of several major historical encyclopedias.

Identifying trending topics of yesteryear

In a paper titled “Temporal Wikipedia search by edits and linkage”[2], the authors develop a method to identify Wikipedia articles associated with topics around a date based on changes the length of the article as well as patterns of the other articles to which it links. This paper expands on prior work in temporal information retrieval and anomaly detection and uses modifications to the HITS and PageRank to return a list of the most relevant documents for a topic on a date. This work has implications for not only using Wikipedia data to identify trending topics, but also to retrospectively identify trending topics. A downloadable Java client allows test searches (for the months of September and October 2011) and the display of the resulting page networks.

Automatic detection of “infiltrating” Wikipedia admins

A paper titled “Manipulation Among the Arbiters of Collective Intelligence: How Wikipedia Administrators Mold Public Opinion”[3], to be presented at next month’s ACM Conference on Information and Knowledge Management (CIKM), makes a rather serious claim: “We find a surprisingly large number of editors who change their behavior and begin focusing more on a particular controversial topic once they are promoted to administrator status.” (more…)

Wikimedia Research Newsletter, August 2013

Wikimedia Research Newsletter
Wikimedia Research Newsletter Logo.png


Vol: 3 • Issue: 8 • August 2013 [contribute] [archives] Syndicate the Wikimedia Research Newsletter feed

WikiSym 2013 retrospective

With contributions by: Piotr Konieczny, Taha Yasseri, Brian Keegan, Dario Taraborelli, Tilman Bayer

WikiSym 2013

WikiSym+OpenSym 2013 group photo (showing around half of the participants at Jumbo Kingdom)

98 registered participants attended the annual WikiSym+OpenSym conference from August 5-7 at Hong Kong’s Cyberport facility. The event preceded the annual global Wikimania conference of the Wikimedia movement in the same city.

WikiSym was started in 2005 as the “International Symposium on Wikis”, and its scope has since been broadened to include the study of other forms of “open collaboration” (such as free software development, or open data), reflected in the adoption of the separate “OpenSym” label. The proceedings, published online at the start of the conference, contain 22 full papers (out of 43 submissions), in addition to short papers, posters, abstracts for research-in-progress presentations, etc. The coverage below reflects the scope of this research report, and complements the pre-conference reviews of some papers in the previous issue.

Episode 96 of the “Wikipedia Weekly” podcast contains some coverage of WikiSym 2013 (from around 10:30-20:00), and some images and media from the event can be found on Wikimedia Commons.

Next year’s WikiSym+OpenSym conference will be held in Berlin, on August 27-29, 2014, and call for papers is already out. Conference chair Dirk Riehle announced that the proceedings will continue to published with ACM, now under its new open access policy.

Full papers

  • Despite policy, only just over half of Wikipedia sources are secondary: “Getting to the Source: Where does Wikipedia Get Its Information From”[1] presents an overall statistics on the sources referred to in English Wikipedia articles to answer this question. The initial seed of source tags is constructed by analysing 30 randomly selected articles, and then all articles in Wikipedia as of May 2012 have been probed to find and classify the references. Some 67 million citations for 3.5 million articles have been found. The classification is performed on a random selection of 500 citations and by two human coders. More than 30% of the citations were classified as primary sources, around 53% as secondary, and around 13% as tertiary. After discussing type, creator, and publisher of the references as well as large scale domain analysis and persistence in time, the paper concludes: “Wikipedia’s content is ultimately driven by the sources from which that content comes. … Although secondary sources are considered by policy to be the most desirable type, we demonstrate that nearly half of all citations are either primary or tertiary sources, with primary sources making up approximately one-third of all citations.”
  • Conflict on Wikipedia as “generative friction”: (more…)

Wikimedia Highlights, July 2013

Information For versions in other languages, please check the wiki version of this report, or add your own translation there!

Highlights from the Wikimedia Foundation Report and the Wikimedia engineering report for July 2013, with a selection of other important events from the Wikimedia movement

Wikimedia Foundation highlights

The new mobile editing interface

Mobile editing

This month, the mobile web team released new navigation features for contributors to all Wikimedia mobile sites, including the existing upload and watchlist star features, as well as an edit button. This means that editing (in the form of section-level markup editing) is now enabled on all mobile Wikimedia sites for logged-in users. The users of our “beta” channel will soon see redesigned mobile notifications, as well as guides for first-time editors and uploaders. (More about the beta channel and how you can opt in.)

Wikipedia Zero comes to India

The first Wikipedia Zero launch in India gives 60 million mobile phone subscribers free-of-charge access to Wikipedia on their mobile phones. The promotional campaign around this launch, led by our partner Aircel, received broad coverage in print and was accompanied by Twitter and blogging events. Aircel customers now have free access to Wikipedia in English, as well as to the 19 Indic language Wikipedias.

VisualEditor beta launch

In July we enabled the new editing interface on several Wikipedias as the default editor, first for logged-in editors and then for anonymous users as well. This resulted in a great deal of feedback, and the team responded with several hundred improvements to fix urgent issues. In addition, the team deployed user interface improvements, most notably to the references insertion dialog. Currently, users are making approximately 800 edits per hour using the VisualEditor on Wikimedia sites.

There are continuing discussions with different language communities about the positioning of the VisualEditor beta in the user interface and appropriate notices indicating its beta status. The Wikimedia Foundation is using the beta period to collect and analyze bug reports, feature enhancements, and data, to observe actual user behavior, and to improve the editing experience continuously. Our eventual objective is for VisualEditor to be the default editor for all Wikipedia users, capable of letting them edit the majority of content without needing to use the wikitext editor.

Please help translate the user interface of VisualEditor.

Data and Trends

(more…)

Wikimedia Foundation Report, July 2013

Information You are more than welcome to edit the wiki version of this report for the purposes of usefulness, presentation, etc., and to add translations of the “Highlights” excerpts.

Data and Trends

Global unique visitors for June:

500 million (-4.25% compared with May; +6.37% compared with the previous year)
(comScore data for all Wikimedia Foundation projects; comScore will release July data later in August)

Page requests for July:

21.2 billion (+0.4% compared with June; +19.8% compared with the previous year)
(Server log data, all Wikimedia Foundation content projects including mobile access, but excluding Wikidata and the Wikipedia main portal page.)

Active Registered Editors for June 2013 (>= 5 mainspace edits/month, excluding bots):

data currently under review
(Database data, all Wikimedia Foundation projects.)

Report Card (integrating various statistical data and trends about WMF projects):

http://reportcard.wmflabs.org/

(Definitions)

Financials

Wikimedia Foundation YTD Revenue and Expenses vs Plan as of June 30, 2013

Wikimedia Foundation YTD Expenses by Functions as of June 30, 2013

(Financial information is only available through June 2013 at the time of this report.)

All financial information presented is for the Month-To-Date and Year-To-Date June 30, 2013.

Revenue $51,040,795
Expenses:
Engineering Group $15,224,438
Fundraiser Group $3,463,128
Grantmaking & Programs Group $8,830,248
Governance Group $742,435
Legal/Community Advocacy/Communications Group $3,090,563
Finance/HR/Admin Group $5,865,595
Total Expenses $37,216,407
Total surplus $13,824,388
  • Revenue for the month of June is $0.48MM versus plan of $0.28MM, approximately $205K or 74% over plan.
  • Year-to-date revenue is $51.04MM versus plan of $46.07MM, approximately $4.97MM or 11% over plan.
  • Expenses for the month of June is $3.39MM versus plan of $3.99MM, approximately $598K or 15% under plan, primarily due to lower personnel expenses, internet hosting, and grant expenses (FDC grants) partially offset by higher capital expenses, outside contract services, legal fees, and travel expenses.
  • Year-to-date expenses is $37.22MM versus plan of $42.07MM, approximately $4.85MM or 12% under plan, primarily due to lower personnel expenses, internet hosting, grant expenses (FDC grants), and travel expenses partially offset by higher capital expenses, legal expenses, bank fees, outside contract services, and personal property tax expenses.
  • Cash position is $39.75MM as of June 30, 2013.

Highlights

The new mobile editing interface

Mobile editing

(more…)

Drafting a strategy plan with the community

This post was authored by User:Barcelona from Amical Wikimedia.

Amical member classifying community proposals during Amical’s 2014-2018 strategy plan taskforce meeting

4 of the 5 members of the taskforce meeting

At Amical Wikimedia we have started the process of thinking about, and writing down, our 2014–2018 strategic plan for the next five years, and also our 2013–2014 annual activity plan (our activities are linked with the educational calendar instead of the calendar year). We are trying apply both the principles and values of our association (shared with the global Wikimedia movement) and the lessons learned at the Program Evaluation June workshop in Budapest, which our GLAM ambassador Kippelboy attended. Thus, the key words defining all this process could be collaboration, efficiency and self-reflection.

First step: including all the voices

We made a collaborative effort to include all the voices within the association and the community of editors in the main plan. First, we opened a public proposals page to collect all the ideas coming from any interested person about the long term evolution of Amical Wikimedia. Secondly, we hosted an online chat meeting via IRC to ask our members’ thoughts about the plan goals, their questions about the priorities and their suggestions about which should be the most important social sectors to address with our activities. Later, we had a few in-depth interviews with some users who are especially committed to Amical.

Second step: task force

Then we created a special task force of five members of different profiles to start the effective actual composition of the plan. After some offline and online meetings and developing previous working documents, we had an all-day session –hosted by the University of Girona– where the taskforce members met and used the specific methodology learned in Budapest, working as a community and implementing evaluation strategies in our programme activities. We shared and evaluated our thoughts and proposals before starting to write down the final document.

Third step: internal evaluation

Our strategy plan is now ready and is being shared on our internal wiki, so all the Amical members can add, remove, change, discuss or challenge its assumptions. The result at the end, which will published in September, will surely reflect the collaborative spirit of our community and all the reflections around what should we become.

Outcomes

This graph explains the relation between our key words and paths

Although the process has not finished yet, we can now present some partial results: Amical’s Strategy Schema and the main intended work tracks. You can see the intersection of goals and actions of our plan in this graphic (still under construction, as is the whole wiki world). On top, there are the five key words which summarize the association’s focus in the next years:

  • Cohesion: We want to keep the community binding and strong personal ties that make possible to work together.
  • Discourse: Since we would like to spread the word about what we do and think to participate in the global debate.
  • Content: Obviously the central point is to help to add content to Wikimedia projects – that should be the main goal for all the chapters and associations.
  • Territory: We want to be active in all the Catalan-speaking countries.
  • Readers: We want to explain the world in Catalan and make the people read it.

Below are the three paths to attain these goals: Activities (programmed projects for the next years), Internal (the sometimes invisible but crucial work to keep the community alive and growing) and External (how we relate with other users, Wikimedia groups and the non-wiki reality).

As for the Activity Plan, several initiatives were suggested by the community: continue the work with museums, extend our presence at universities and schools, promote Wikimedia sister projects, prepare a multilingual contest on Wikipedia focused on Catalan culture and history, increase the number of libraries that we are already collaborating with, explore new possibilities to attract editors… We are really excited to see what comes next, because surely at Amical we have plenty of work to do!

User:Barcelona, Amical Wikimedia

Wikimedia Research Newsletter, July 2013

Wikimedia Research Newsletter
Wikimedia Research Newsletter Logo.png


Vol: 3 • Issue: 7 • July 2013 [contribute] [archives] Syndicate the Wikimedia Research Newsletter feed

Napoleon, Michael Jackson and Srebrenica across cultures, 90% of Wikipedia better than Britannica, WikiSym preview

With contributions by: Taha Yasseri, Han-Teng Liao, Piotr Konieczny, Jonathan Morgan and Tilman Bayer

Multilingual ranking analysis: Napoleon and Michael Jackson as Wikipedia’s “global heroes”

An ArXiv preprint titled “Highlighting entanglement of cultures via ranking of multilingual Wikipedia articles”[1], authored by a group of physicists from France, examines the Wikipedia articles on individuals and their position in the hyperlink network of the articles in each Wikipedia language edition. There are 9 language editions studied. The authors try to locate the most “important” individuals (“heroes”) in each language edition by calculating two different page rank scores: PageRank and CheiRank. After making the lists of individuals with highest ranks in each language edition (with 30 individuals in each list), overlaps between lists are investigated and local and global “heroes” are introduced. The lists of “global heroes” are topped by Napoleon for PageRank, and Michael Jackson for 2DRank. It is shown that both local and global heroes exist and while global heroes gain their central position in the network due to links from multiple other central nodes, local heroes are mostly notable because of the large number of links directly pointing to them. Finally, based on the nationality (language of origin) of the highly ranked individual, a network of languages is constructed and the position of each language in this network is analysed by calculating rank scores. The authors also analyzed the activities of those important individuals, and have found politicians and scientists to be quite often among the most important ones.

Art: Image-sharing relationship between 154 language versions of Wikipedia (from the DMI Summer School 2013)

Wikipedia as Cultural Reference: Srebrenica Massacre, Art and Menstruation

(more…)

Wikimedia Highlights, June 2013

Information For versions in other languages, please check the wiki version of this report, or add your own translation there!

Highlights from the Wikimedia Foundation Report and the Wikimedia engineering report for June 2013, with a selection of other important events from the Wikimedia movement

Wikimedia Foundation highlights

First workshop on how to evaluate the success of organized Wikimedia activites

On June 22–23, the first workshop on the design and evaluation of programs (organized activities) in the Wikimedia movement took place. The event was held Budapest, Hungary by the Wikimedia Foundation, in partnership with Wikimedia Magyarország, the local chapter. The workshop brought together 21 program leaders from 15 countries to learn the basic concepts of program evaluation. The success of the workshop itself was evaluated, too: Surveys before and after the workshop showed that a majority of the participants left with a better understanding of these terms and concepts.

One of the tasks made easier by VisualEditor: References can now be edited and added in a format that is more convenient than the <ref> tags in the middle of the page’s source wikitext.

Preparations for the launch of VisualEditor and Universal Language Selector

In June, work was completed on major new features for VisualEditor (the visual interface to edit wiki pages without markup), in preparation for its launch for all logged-in editors on the English Wikipedia on July 1. It is becoming available to most other Wikipedians during the rest of July.

Also in June, the Universal Language Selector began to be deployed to all Wikimedia projects. It allows users to configure language settings like interface language, fonts, and input methods (keyboard mappings) in a flexible way. By July 1, it was available on more than 150 wikis.

Universal Language Selector: A logged-in user is choosing the language which they prefer for the interface menus on the English Wikipedia

Community input invited for privacy policy update

In preparation for an update of the Wikimedia Foundation’s privacy policy (the first since 2008), the Legal and Community Advocacy (LCA) team has invited participation in a community discussion period, lasting until July 18. The goal was to get initial input about what privacy concerns community members have, what they find important, and what they would like to see in the next version of the privacy policy. The community was also asked to provide input on practices regarding the Wikimedia trademarks.

Global unique visitors for May:

(more…)

Wikimedia Foundation Report, June 2013

Information You are more than welcome to edit the wiki version of this report for the purposes of usefulness, presentation, etc., and to add translations of the “Highlights” excerpts.

Global unique visitors for May:

522 million (+0.97% compared with April; +5.97% compared with the previous year)
(comScore data for all Wikimedia Foundation projects; comScore will release June data later in July)

Page requests for June:

21.1 billion (+0.7% compared with May; +17.1% compared with the previous year)
(Server log data, all Wikimedia Foundation projects including mobile access)

Active Registered Editors for May 2013 (>= 5 mainspace edits/month, excluding bots):

80,611 (-0.19% compared with April / -1.92% compared with the previous year)
(Database data, all Wikimedia Foundation projects.)

Report Card (integrating various statistical data and trends about WMF projects):

http://reportcard.wmflabs.org/

(Definitions)

Financials

Wikimedia Foundation YTD Revenue and Expenses vs Plan as of May 31, 2013

Wikimedia Foundation YTD Expenses by Functions as of May 31, 2013

(Financial information is only available through May 2013 at the time of this report.)

All financial information presented is for the Month-To-Date and Year-To-Date May 31, 2013.

Revenue $50,559,430
Expenses:
Engineering Group $13,523,471
Fundraiser Group $3,265,731
Grantmaking & Programs Group $8,284,686
Governance Group $692,321
Legal/Community Advocacy/Communications Group $2,777,192
Finance/HR/Admin Group $5,272,075
Total Expenses $33,815,476
Total surplus $16,743,954
  • Revenue for the month of May is $0.12MM versus plan of $0.28MM, approximately $159K or 57% under plan.
  • Year-to-date revenue is $50.56MM versus plan of $45.79MM, approximately $4.77MM or 10% over plan.
  • Expenses for the month of May is $2.97MM versus plan of $4.01MM, approximately $1.04MM or 26% under plan, primarily due to lower personnel expenses, internet hosting, and grant expenses (FDC grants) partially offset by higher capital expenses, outside contract services, and travel expenses.
  • Year-to-date expenses is $33.82MM versus plan of $38.08MM, approximately $4.26MM or 11% under plan, primarily due to lower personnel expenses, internet hosting, grant expenses (FDC grants), and travel expenses partially offset by higher legal expenses, bank fees, outside contract services, and personal property tax expenses.
  • Cash position is $42.7MM as of May 31, 2013.

Highlights

First workshop on how to evaluate the success of organized Wikimedia activites

(more…)

Wikimedia Research Newsletter, June 2013

Wikimedia Research Newsletter
Wikimedia Research Newsletter Logo.png


Vol: 3 • Issue: 6 • June 2013 [contribute] [archives] Syndicate the Wikimedia Research Newsletter feed

Most controversial Wikipedia topics, automatic detection of sockpuppets

With contributions by: Giovanni Luca Ciampaglia, Taha Yasseri and Tilman Bayer.

Contents

“The most controversial topics in Wikipedia: a multilingual and geographical analysis”

Map of Conflict in Spanish Wikipedia. Each dot represents a geolocated article. Size and colour of dots are corresponding to the controversy measure according to Sumi et al. (2001)[1]. The map is taken from Yasseri, et al. (2013) [2].

A comparative work by T. Yasseri., A. Spoerri, M. Graham and J. Kertész on controversial topics in different language versions of Wikipedia has recently been posted on the Social Science Research Network (SSRN) online scholarly archive [1]. The paper, which will appear as a chapter of an upcoming book titled “Global Wikipedia: International and cross-cultural issues in online collaboration”, to be published by Scarecrow Press in 2014, and edited by Fichman P., and Hara N., looks at the 100 most controversial topics in 10 language versions of Wikipedia (results including 3 additional languages are reported in the blog of one of the authors), and tries to make sense of the similarities and differences in these lists. Several visualization methods are proposed, based on a flash-based tool developed by the authors, called CrystalView. Controversiality is measured using a scalar metric which takes into account the total volume of pairwise mutual reverts among all contributors to a page. This metric was proposed by Sumi et al. (2011)[2], in a paper reviewed two years ago in this newsletter (“Edit wars and conflict metrics“). Topics related to politics, geographical locations, and religion are reported to be the most controversial across the board, and each language seems to feature specific, local controversies, which the authors further track down by grouping together languages with similar spheres of influences. Furthermore, the presence of latitude/longitude information (geocoordinates) in several of the Wikipedia articles in the sample analyzed in the study let the authors map the top controversial topics to a global world map, showing how each language features both local and global issues as the most heated topics of debate.

In summary, the study shows how valuable information about cross-cultural differences can be extracted from traces of Internet activity, though one obvious question is how the demographics of Wikipedia editors affect the representativeness of the results, an issue which the authors seem to be aware of, and which is probably going to play a role of increasing importance, as the field of cultural studies looks more and more at data generated by peer production communities.

The research has been intensely featured in the media, e.g., Huffington Post, Live Science, Wired.com, Zeit Online.

Non-virtual sockpuppets created by participants of RecentChangesCamp, as a humorous take on the sockpuppet phenomenon in online communities

Sockpuppet evidence from automated writing style analysis

“A Case Study of Sockpuppet Detection in Wikipedia”[3], presented at a “Workshop on Language in Social Media” this month, describes an automated method to analyze the writing style of users for the purpose of detecting or confirming sockpuppets. The abuse of multiple accounts (also known as “multi-aliasing” or sybil attacks in other contexts) is described as “a prevalent problem in Wikipedia, there were close to 2,700 unique suspected cases reported in 2012.”

(more…)

Wikimedia Highlights, May 2013

Information For versions in other languages, please check the wiki version of this report, or add your own translation there!

Highlights from the Wikimedia Foundation Report and the Wikimedia engineering report for May 2013, with a selection of other important events from the Wikimedia movement

Wikimedia Foundation highlights

From the Fundraising report: The “facts” banner (listing some basic facts about Wikipedia) was tested in many different versions and eventually performed better than all previous fundraising banners

Fundraising report released

The Wikimedia Foundation’s fundraising team published a report from the 2012-2013 fundraiser. The report reviews the evolution of banner design and includes data about the 2012 year-end English campaign and the 2013 multilingual campaign, which raised a total $35 million USD from over 2 million donors.

Community invited to discuss trademark practices

The Legal and Community Advocacy (LCA) team published a statement on trademark practices, which requests community feedback on the Wikimedia trademark policy, procedure, and other questions. The objective is to balance the interest in licensing the brand for mission-aligned activities, with the necessity of preventing misuse and “naked licensing” (licensing without quality control). This is the opportunity to provide ideas as the team considers updating the trademark policy and practices.

Wikipedia Zero launches in Pakistan

Wikipedia Zero, the program to give people around the world mobile access to Wikipedia free of data charges, is now available in Pakistan, in partnership with Mobilink (Vimpelcom). The company’s user base of over 32 million people makes this the second largest Wikipedia Zero launch to date.

The “Nearby” feature in Vatican City. The camera icon (bottom) indicates an article which misses images, inviting users to contribute one.

“Nearby” feature shows Wikipedia articles in the reader’s vicinity

On location-aware devices (such as smartphones with GPS), a new “Nearby” page lists articles close to the reader’s current location. The feature is designed for mobile devices, but also works on the desktop version of Wikipedia.

Presentation slides with the Tool Labs logo

New hosting environment for community-developed tools

The Tool Labs, an environment for community developers to provide external software tools supporting work on Wikimedia projects, is now operating. With the support of the German Wikimedia chapter, many existing tools have already migrated from the Wikimedia Toolserver to Tools Labs.

Search for new Executive Director begins

The job opening for the Wikimedia Foundation’s new Executive Director has been posted. This starts the search for a successor for Sue Gardner, who will step down later this year. Board of Trustees chair Kat Walsh asked Wikimedians for help in finding the best possible candidate, by spreading the news in their networks.

Global unique visitors for April:

(more…)