Wikimedia blog

News from the Wikimedia Foundation and about the Wikimedia movement

Europeana Fashion Handbook to Bring Wiki and GLAMs Together

In an effort to improve fashion knowledge on the web, Europeana Fashion has organized a series of edit-a-thons with Wikimedia volunteers and fashion institutions around Europe. The experience and knowledge gained from these events are now compiled in one handbook, The Europeana Fashion Edit-a-thon Handbook for GLAMs.

Fashion Edit-a-thon Logo.png

What is fashion? Fashion is vanity, fashion is business, fashion is art. Fashion can mean many things to many people, but what is certain, is that it has enormous cultural significance. Every item of clothing has its roots in history and carries a symbolic meaning in the present.

2013-05-13 Europeana Fashion Editathon, Centraal Museum Utrecht 39.jpg

An edit-a-thon around fashion in collaboration with Wikimedia Netherlands and Fashion Muse. May 13, 2013. 

Take, for example, the most basic of garments, the T-shirt. It was originally designed as an undergarment in the American army in the early 20th century. In the 1950s it became part of the uniform of rebellious youth culture and was seen on the likes of Marlon Brando and James Dean. Nowadays, the T-shirt is worn everywhere with everything, even under a suit. From underwear, to act of rebellion to formal, fashion objects can be considered artifacts of past and present.

That is why there are public and private institutions collecting fashion. Europeana Fashion aims to bring all these collections together in one online portal and improve knowledge around these collections.

The best way to improve knowledge online is through Wikipedia. It’s open, free and one of the most visited websites. In an effort to get communities and institutions involved, Europeana Fashion hosted multiple Wiki edit-a-thons.

Badge Fashion Editathon.jpg

Fashion badge Edit-a-thon Europeana. Museum of Decorative Arts (Paris), March 22, 2014. 

After setting up seven edit-a-thons in five countries in one year’s time, the project bundled its experiences in a handbook for organizing fashion edit-a-thons. It is directed towards galleries, libraries, archives and museums, or in short: GLAMs. The handbook is available online and open to improvement from the community.

Engaging Fashionistas

Fashion carries with it very relevant cultural, historical and symbolic meaning. However, despite its social significance, fashion’s presence on Wikipedia is not as comprehensive as it should be. This encouraged Europeana Fashion to partner with Wikimedia volunteers in an effort increase fashion knowledge and open multimedia in the Wiki world.

Twenty-two partners from twelve European countries work together on the Europeana Fashion portal. Together, these institutions collect and make available thousands of historical dresses, accessories, photographs, posters, drawings, sketches, videos and fashion catalogues. At the same time, it makes these items findable through Europe’s online cultural hub Europeana. Europeana Fashion invited its partners to make available their collections on Wikimedia Commons and welcomed users to write about their collections. The aim: to enrich and share the knowledge about these objects and improve the existing knowledge about fashion’s history, origins and trajectory on Wikipedia.
Read the rest of this entry »

Odisha Dibasa 2014: 14 books re-released under CC license

This post is available in 2 languages:
English  • Odia

 Guests releasing a kit DVD containing Odia typeface “Odia OT Jagannatha,” offline input tool “TypeOdia,” Odia language dictionaries, open source softwares, offline Odia Wikipedia and Ubuntu package.

Odisha became a separate state in British India on April 1, 1936. Odia, a 2,500 year old language recently gained the status of an Indian classical language. The Odia Wikimedia community celebrated these two occasions on March 29 in Bhubaneswar with a gathering of 70 people. Linguists, scholars and journalists discussed the state of the Odia language in the digital era, initiatives for its development and steps that can be taken to increase accessibility to books and other educational resources. 14 copyrighted books have been re-licensed under the Creative Commons license and the digitization project on Odia WikiSource was formally initiated by an indigenous educational institute, the Kalinga Institute of Social Sciences (KISS). Professor Udayanath Sahu from Utkal University, The Odisha Review’s editor Dr. Lenin Mohanty, Odisha Bhaskar’s editor Pradosh Pattnaik, Odia language researcher Subrat Prusty, Dr. Madan Mohan Sahu, Allhadmohini Mohanty, Chairman Manik-Biswanath Smrutinyasa and trust’s secretary Brajamohan Patnaik along with senior members Sarojkanta Choudhury and Shisira Ranjan Dash spoke at the event.

 Group photo of Odia wikimedians participating in the advanced Wikimedia workshop at KIIT University.

Eleven books from Odia writer Dr. Jagannath Mohanty were re-released under Creative Commons Share-Alike (CC-BY-SA 3.0) license by the “Manik-Biswanath Smrutinyasa” trust,  a trust founded by Dr. Mohanty for the development of the Odia language. Allhadmohini Mohanty formally gave written permission to Odia Wikimedia to release and digitize these books.

The community will be training students and a group of six faculty members at KISS who will coordinate the digitization of these books. “Collaborative efforts and open access to knowledge repositories will enrich our language and culture,” said linguist Padmashree Dr. Debiprasanna Pattanayak as he inagurated the event. Dr. Pattanayak and Odia language researcher Subrat Prusty from the Institute of Odia Studies and Research also re-licensed three books (Two Odia books; “Bhasa O Jatiyata“, “Jati, Jagruti O Pragati” and an English book “Classical Odia”) based on their research on Odia language and cultural influence of the language on other societies under the same license. KISS is going to digitize some of these books and make them available on Odia Wikisource.

Read the rest of this entry »

A young developer’s story of discovery, perseverance and gratitude

This post is a discovery report written by Jared Flores and slightly edited for publication. It’s part of a series of candid essays written by Google Code-in students, outlining their first steps as members of the Wikimedia technical community. You can write your own.


When I initially heard of the Google Code-In (GCI) challenge, I wasn’t exactly jumping out of my seat. I was a little apprehensive, since the GCI sample tasks used languages such as Java, C++, and Ruby. While I’ve had my share of experience with the languages, I felt my abilities were too limited to compete. Yet, I’ve always had a fiery passion for computer science, and this challenge presented another mountain to conquer. Thus, after having filtered through the hundreds of tasks, I took the first step as a Google Code-In student.

The first task I took on was to design a share button for the Kiwix Android app, an offline Wikipedia reader. Though Kiwix itself wasn’t a sponsoring organization for GCI, it still provided a branch of tasks under the Wikimedia umbrella. With five days on the clock, I researched vigorously and studied the documentation for Android’s share API.

After a few hours of coding, the task seemed to be complete. Reading through the compiler’s documentation, I downloaded all of the listed prerequisites, then launched the Kiwix autogen bash file. But even with all of the required libraries installed, Kiwix still refused to compile. Analyzing the error logs, I encountered permission errors, illegal characters, missing files, and mismatched dependencies. My frustration growing, I even booted Linux from an old installation DVD, and tried compiling there. I continued this crazy cycle of debugging until 2 am. I would have continued longer had my parents not demanded that I sleep. The next morning, I whipped up a quick breakfast, and then rushed directly to my PC. With my mind refreshed, I tried a variety of new approaches, finally reaching a point when Kiwix compiled.

With a newly-found confidence, I decided to continue pursuing more GCI tasks. Since I had thoroughly enjoyed the challenge presented by Kiwix, I initially wanted to hunt down more of their tasks. However, finding that there weren’t many left, I gained interest in Kiwix’s supporting organization: Wikimedia. I navigated to Wikimedia’s GCI information page and began familiarizing myself with the organization’s mission.

“We believe that knowledge should be free for every human being. We prioritize efforts that empower disadvantaged and underrepresented communities, and that help overcome barriers to participation. We believe in mass collaboration, diversity and consensus building to achieve our goals. Wikipedia has become the fifth most-visited site in the world, used by more than 400 million people every month in more than 270 languages.” – About Us: Wikimedia (GCI 2013)

Reading through the last sentence once more, I realized the amazing opportunities that were ahead of me. Whenever I needed to touch up on any given topic, Wikipedia was always one of the top results. Moreover, Wikipedia had become a source of entertainment for me and my friends. We always enjoyed hitting up a random article, then using the given links to find our way to Pokémon, Jesus, or maybe even Abraham Lincoln: Vampire Hunter.

Eager to begin, I chose video editing as my second task for Wikimedia. I began the long endeavor of watching, reviewing, and editing the two forty-five minute clips. Despite the lengthy videos, I was quite amused in seeing the technical difficulties that the Wikimedia team encountered during their Google Hangout. It was also comforting to put human faces behind the Wikimedia mentors of Google Code-In.

As with my first task, the work itself sped by quickly. But also similar to Kiwix, I encountered some difficulties with the “trivial” part of the task. I had never worked with the wiki interface before, so the wiki structure was somewhat foreign. I only had a vague idea of how to create a page. I also didn’t know where to upload files, nor did I know how to create subcategories. Nonetheless, after observing the instructions in Wikipedia’s documentation, I finally managed to upload the videos. Marking the task as complete, I scouted for my third GCI task.

Unbeknownst to me, my third task for Wikimedia would also prove to be the most challenging so far. Since this task required me to modify the code, I requested developer access. With the help of Wikimedia’s instructions, I registered myself as a developer, generated a private key to use with their servers, and proceeded to download the source code.

Though my experience with Git was quite basic, MediaWiki provided an easy to follow documentation, which aided greatly in my efforts to download their repository. As I waited for the download to complete, I quickly set up an Apache server for a testing environment. Configuring the MediaWiki files for my server, I began the installation. Fortunately, MediaWiki’s interface was quite intuitive; the installer performed flawlessly with minimal user input.

“Off to a good start,” I chuckled quietly to myself, a grin spreading across my face. And with that statement I tempted fate and my troubles had begun. Upon opening the code, I realized I couldn’t easily comprehend a single line. I had worked with PHP but the code was more advanced than what I had written before.

Running my fingers through my hair, I sighed in exasperation. I spent the next few hours analyzing the code, trying my best to decipher the functions. Suddenly, patterns began appearing and I began to recognize numerous amounts of functions. I started to tinker with different modules until the code slowly unraveled.

Finally formulating a solution, my fingers moved swiftly across the keyboard, implementing the code with ease. Confident that I had tested my code well, I followed the instructions written in the GCI’s task description, and uploaded my very first patch to Gerrit.

I was surprised at how simple the upload was. But what especially surprised me was the immediate feedback from the mentors. Within a few minutes of the upload, MediaWiki developers were already reviewing the patch, making suggestions for improvement.

Thankful for their helpful input, I worked to implement the changes they suggested. Adding the finishing touches, I was ready to upload another patch. However, I was unsure if I should upload to a new Gerrit, or if I should push to the same patch as before. Unclear about the step I should take, I made the rookie error of uploading to a new Gerrit commit.

My mistake quickly received a corrective response from Aude via the Gerrit comment system. While I initially felt embarrassed, I was also relieved that I didn’t have to work alone. In fact, I was thankful that the MediaWiki collaborators taught me how to do it right.

Checking out the link Aude had given me, I learned to squash the two commits together. However, when I tried to follow Aude’s instructions, I somehow managed to mix someone else’s code with my own. What’s even worse was I already pushed the changes to Gerrit, exposing my blunder publicly.

Had it been any normal day, I would’ve just been calm and tried my best to fix it. But it just so happened to be the Thanksgiving holiday (in the United States). I had to leave in a few minutes for a family dinner and I couldn’t bear the thought of leaving my patch in a broken state.

I felt about ready to scream. I abandoned my Gerrit patch, and navigated to the task page, ready to give up. But just as I was about to revoke my claim on the task, I remembered something Quim Gil had told another GCI student:

“They are not mistakes! Only versions that can be improved. Students learn in GCI, and all of us learn every day.”

Remembering this advice, I cleared my mind, ready to do whatever it would take, and learn while I was at it. And like an answer to my prayers, Hoo Man, another developer, posted a comment in Gerrit. He guided me through how I could return to my original patch and send my new improvements through. And more importantly, he motivated me to persevere.

I came into GCI as a passionate, yet undisciplined student. I’m thrilled that in joining this competition, the Wikimedia open source community has already helped me plant the seeds of discipline, perseverance, and collaboration. It’s no coincidence that my hardest task thus far was staged on Thanksgiving. Every year I express gratitude towards friends and family. But this year, Google Code-In and the Wikimedia community have made my gratitude list as well.

Jared Flores
2013 Google Code-in student


Read in this series:

Migrating Wikimedia Labs to a new Data Center

As part of ongoing efforts to reduce our reliance on our Tampa, Florida data center, we have just moved Wikimedia Labs to EQIAD, the new data center in Ashburn, Virginia. This migration was a multi-month project and involved hard work on the part of dozens of technical volunteers. In addition to reducing our reliance on the Tampa data center, this move should provide quite a few benefits to the users and admins of Wikimedia Labs and Tool Labs.

Migration objectives

We had several objectives for the move:

  1. Upgrade our virtualization infrustructure to use OpenStack Havana;
  2. Minimize project downtime during the move;
  3. Stop relying on nova-network and start using Neutron;
  4. Convert the Labs data storage system from GlusterFS to NFS;
  5. Identify abandoned and disused Labs resources.

Upgrade and Minimize Downtime

Wikimedia Labs uses OpenStack to manage the virtualization back-end. The Tampa Labs install was running a slightly old version of OpenStack, ‘Folsom’. Folsom is more than a year old now, but OpenStack does not provide an in-place upgrade path that doesn’t require considerable downtime, so we’ve been living with Folsom to avoid disrupting existing Labs services.

Similarly, a raw migration of Labs from one set of servers to another would have required extensive downtime, as simply copying all of the data would be the work of days.

The solution to both 1) and 2) was provided by OpenStack’s multi-region support. We built an up-to-date OpenStack install (version ‘havana’) in the Ashburn center and then modified our Labs web interface to access both centers at once. In order to ease the move, Ryan Lane wrote an OpenStack tool that allowed users to simultaneously authenticate in both data centers, and updated the Labs web interface so that both data centers were visible at the same time.

At this point (roughly a month ago), we had two different clouds running: one full and one empty. Because of a shared LDAP back-end, the new cloud already knew about all of our projects and users.

Two clouds, before migration

Then we called on volunteers and project admins for help. In some cases, volunteers built fresh new Labs instances in Ashburn. In other cases, instances were shut down in Tampa and duplicated using a simple copy script run by the Wikimedia Operations team. In either case, project functions were supported in both data centers at once so that services could be switched over quickly and at the convenience of project admins.

Two clouds, during migration

As of today, over 50 projects have been copied to or rebuilt in Ashburn. For those projects with uptime requirements, the outages were generally limited to a few minutes.

Switch to OpenStack Neutron

We currently rely on the ‘nova-network’ service to manage network access between Labs instances. Nova-network is working fine, but OpenStack has introduced a new network service, Neutron, which is intended to replace nova-network. We hoped to adopt Neutron in the Ashburn cloud (largely in order to avoid being stuck using unsupported software), but quickly ran into difficulties. Our current use case (flat DHCP with floating IP addresses) is not currently supported in Neutron, and OpenStack designers seem to be wavering in their decision to deprecate nova-network.

After several days of experimentation, expedience won out and we opted to reproduce the same network setup in Ashburn that we were using in Tampa. We may or may not attempt an in-place switch to Neutron in the future, depending on whether or not nova-network continues to receive upstream support.

Switch to NFS storage

Most Labs projects have shared project-wide volume for storing files and transferring data between instances. In the original Labs setup, these shared volumes used GlusterFS. GlusterFS is easy to administer and designed for use cases similar to ours, but we’ve been plagued with reliability issues: in recent months, the lion’s share of Labs failures and downtime were the result of Gluster problems.

When setting up Tool Labs last year and facing our many issues with GlusterFS, Marc-Andre Pelletier opted to set up a new NFS system to manage shared volumes for the Tool Labs project. This work has paid off with much-improved stability, so we’ve adopted a similar system for all projects in Ashburn.

Again, we largely relied on volunteers and project admins to transfer files between the two systems. Most users were able to copy their data over as needed, scping or rsyncing between Tampa and Ashburn instances. As a hedge against accidental data loss, the old Gluster volumes were also copied over into backup directories in Ashburn using a simple script. The total volume of data copied was around 30 Terabytes; given the many-week migration period, network bandwidth between locations turned out not to be a problem.

Identify and reclaim wasted space

Many Labs projects and instances are set up for temporary experiments, and have a short useful life. The majority of them are cleaned up and deleted after use, but Labs still has a tendency to leak resources as the odd instance is left running without purpose.

We’ve never had a very good system for tracking which projects are or aren’t in current use, so the migration was a good opportunity to clean house. For every project that was actively migrated by staff or volunteers, another project or two simply sat in Tampa, unmentioned and untouched. Some of these projects may yet be useful (or might have users but no administrators), so we need to be very careful about prematurely deleting them.

Projects that were not actively migrated (or noticed, or mentioned) during the migration period have been ‘mothballed’. That means that their storage and VMS were copied to Ashburn, but are left in a shutdown state. These instances will be preserved for several months, pending requests for their revival. Once it’s clear that they’re fully abandoned (in perhaps six months), they will be deleted and the space reused for future projects.

Conclusions

In large part, this migration involved a return to older, more tested technology. I’m still hopeful that in the future Labs will be able to make use of more fundamentally cloud-designed technologies like distributed file shares, Neutron, and (in a perfect world) live instance migration. In the meantime, though, the simple approach of setting up parallel clouds and copying things across has gone quite well.

This migration relied quite heavily on volunteer assistance, and I’ve been quite charmed by how gracious the vast majority of volunteers were about this inconvenience. In many cases, project admins regarded the migration as a positive opportunity to build newer, cleaner projects in Ashburn, and many have expressed high hopes for stability in the new data center. With a bit of luck we’ll prove this optimism justified.

Andrew Bogott, DevOps Engineer

Modernising MediaWiki’s Localisation Update

Interface messages on MediaWiki and its many extensions are translated into more than 350 languages on translatewiki.net. Thousands of translations are created or updated each day. Usually, users of a wiki would have to wait until a new version of MediaWiki or of an extension is released to see these updated translations. However, webmasters can use the LocalisationUpdate extension to fetch and apply these translations daily without having to update the source code.

LocalisationUpdate provides a command line script to fetch updated translations. It can be run manually, but usually it is configured to run automatically using cron jobs. The sequence of events that the script follows is:

  1. Gather a list of all localisation files that are in use on the wiki.
  2. Fetch the latest localisation files from either:
    • an online source code repository, using https, or
    • clones of the repositories in the local file system.
  3. Check whether English strings have changed to skip incompatible updates.
  4. Compare all translations in all languages to find updated and new translations.
  5. Store the translations in separate localisation files.

MediaWiki’s localisation cache will automatically find the new translations via a hook subscribed by the LocalisationUpdate extension.

Until very recently the localisation files existed in PHP format. These are now converted to JSON format. This update required changes to be made in LocalisationUpdate to handle JSON files. Extending the code piecemeal over the years had made the code base tough to maintain. The code has been rewritten with extensibility to support future development as well as to retain adequate support for older MediaWiki versions that use this extension.

The rewrite did not add any new features except support for JSON format. The code for the existing functionality was refactored using modern development patterns such as separation of concerns and dependency injection. Unit tests were added as well.

The configuration format for the update scripts changed, but most webmasters won’t need to change anything, and will be able to use the default settings. Changes will be needed only on sites that for some reason don’t use the default repositories.

New features are being planned for future versions that would optimise LocalisationUpdate to run faster and without any manual configuration. Currently, the client downloads the latest translations for all extensions in all languages and then compares which translations can be updated. By moving some of the complex processing to a separate web service, the client can save bandwidth by downloading only updated messages for specific updated languages used by the reader.

There are still more things to improve in LocalisationUpdate. If you are a developer or a webmaster of a MediaWiki site, please join us in shaping the future of this tool.

Niklas Laxström and Runa Bhattacharjee, Language Engineering, Wikimedia Foundation

Wikimedia RU expands Wikipedia Voice intro project to include music

WikiMusic logoRecently, Wikimedia RU (the Russian Wikimedia chapter) successfully launched the Russian version of the “Wikipedia voice intro project” and expanded it to incorporate the “WikiMusic” project. Now it not only covers celebrity voices but also free music – which prior to this had no significant presence in any Wikimedia project.

How did it happen?

The recent launch of the “Wikipedia voice intro project” got extensive coverage in Russian and European press. While the topic was hot, directors of Wikimedia RU achieved an agreement with the chief editor of “Echo of Moscow” radio, Alexei Venediktov, to start a similar joint project in Russia – “WikiVoices“.

It should be mentioned that Echo of Moscow has the largest audience among Moscow radio stations and broadcasts in more than 40 cities in Russia, the United States and Baltic states. It has the highest citation index between all Russian media, exceeding even TV channels. So, we are really happy to start working together with such a partner. According to the agreement, Echo will do the following:

  • ask their guests for short neutral stories about themselves without propaganda, advertising or personal attacks so that they will be suitable for future usage in Wikipedia;
  • search through their archive records (dating back to 1990) and provide us with interesting samples;
  • not only provide us with records of guests who came to their studio but also ask their external correspondents to make such records;
  • publish  photos of their guests under free licenses.

“Waves of the Danube” waltz.

Gypsy song from the opera “Carmen.”

“The Lost Chord.”

Voice Recording of speech “On the cultural role of the gramophone.”

Echo of Moscow not only agreed to donate such materials but also did a lot for simplification of this process: all records are posted at their official website with information about the person and direct statement of CC-BY-SA license for the records. The log of uploads is prepared in the machine-readable XML format and new records are automatically uploaded to Commons via bots in the free OGG format. At the moment about 40 records were uploaded: now we have voice records of the Chancellor of Germany Angela Merkel (with translator), the USSR president Mikhail Gorbachev, journalist Vladimir Pozner and many other famous people.

Fortunately, that’s not the only good news we want to share. While we were announcing the start of WikiVoices project on the, we were heard by the Russian State Archive of Sound Recordings. This archive was founded in 1932 and at the moment has more than 240,00  records. Many catalogs are not available online and many records are not digitized, but the Archive is ready to convert desirable records into the digital format and donate them to us.

Read the rest of this entry »

Discovering and learning by asking questions

This post is a discovery report written by Vlad John and slightly edited for publication. It’s part of a series of candid essays written by Google Code-in students, outlining their first steps as members of the Wikimedia technical community. You can write your own.


In the past years, I’ve used Wikipedia as often as I’ve used Facebook. I’ve used it for homework or simply for finding something new. When I was first introduced to the Internet world, I always asking myself: how can someone make a site with so many people browsing it? This year, I found the answer at the Google Code-In contest. As I was browsing for a task that suited me, I found an organization called Wikimedia.

While browsing the tasks they offered, I found something that caught my eye. It was a task about editing the wiki. I was so happy that I had finally found a task that suited my tastes that I clicked “Claim Task” before reading what I had to do. But when I read more about specifics of the task… well, it is enough to say that I had no idea how to start. I was supposed to “clean up” the “Raw projects” section of the Possible projects page. I clicked the link to the wiki page I was supposed to edit, and as I started working, I encountered several problems that I will describe in a moment. But thanks to my mentor, Quim Gil, I succeeded in completing the task.

I always wanted to edit a Wiki page, but at first I was afraid. What if I did something wrong? After posting a text file on the Code-in task’s page, I received a comment that said that in the end I’d need to edit the wiki page itself, so I might as well start early. This made sense, so I dove into the unknown territory of editing.

I started by looking at the history of the page to find the things I had to add. That took a while, but in a shorter time that I first thought was necessary, I learned how to find information in earlier edits, how to edit the source code of the page and how to do minor edits on the headings and structure. But this was the easy part.

I just had to copy some names and move them to their appropriate place. However, when it came to reporting bugs, I was indeed lost. I knew from the task I had to use Bugzilla to report bugs and add comments, but I didn’t have the foggiest idea how to do it. That is when I started doing what I had to do in the first place: ask questions.

I realized that the whole point of this exercise was to teach students how to do different things, and the most important thing when learning is to ask questions everywhere: on forums, consult the FAQ or the Manual , or simply search more for the answer. So I began by reading the full Bugzilla guide, but that did not really answer my questions. At least, not until I found the “How to report a bug” guide. This gave me some important information, like what to look for and how a report should look.

But I still had one problem: the guide said a thing and the mentor said something else. So I decided to ask once more on the page of the task. In no time, I received an answer and a model. Apparently, the guide was right about one part of the task, and the mentor was right about another part. So, by combining the answers from these two sources, I managed to find the answer to my problem. Once I knew what I was looking for, and once I asked the right questions, I got the answers I needed.

From there, it was not too hard to start adding and commenting bugs on Bugzilla. The next problem appeared when I had to add the bug reports on the wiki page… I thought I was done the moment I added the bugs on Bugzilla, but again my lack of attention and knowledge got the best of me. So I told myself: If asking the right question gets me the information I need, why not ask again? After all I am here to learn.

So I went back to the task page and put another 2 paragraphs of questions. Indeed, I received the answers that helped me learn something about editing the source of the page. So I dove in once again in the unknown and started the work. After a hard time finding the bug reports again, I was finally done and I completed the task.

After finishing, I realised that a person can learn anything on his or her own, but learning is more effective if a mentor or teacher helps you. Also, a teacher that just tells you what to read and does not explain is less helpful than a teacher that knows how and what to explain, when to do it and speaks to you in a nice way, and by that helping you, like Quim Gil helped me, with explanations and examples, in completing the task.

So, to sum up, if you ever want to learn something about Wikimedia (or other things), the best way is to ask other people, be he or she a mentor like Quim Gil was for me, or a complete stranger on a forum, like StackOverflow, which is an important place for coding and scripting help. Many people say that learning has no shortcuts, but, if questions are not shortcuts, then they sure are a real help in education. Why? Because with questions come information, and with information comes knowledge.

Vlad John
2013 Google Code-in student


Read in this series:

Wikimedia Research Newsletter, March 2014

Wikimedia Research Newsletter
Wikimedia Research Newsletter Logo.png


Vol: 4 • Issue: 3 • March 2014 [contribute] [archives] Syndicate the Wikimedia Research Newsletter feed

Wikipedians’ “encyclopedic identity” dominates even in Kosovo debates; analysis of “In the news” discussions; user hierarchy mapped

With contributions by: Federico Leva, Scott Hale, Kim Osman, Jonathan Morgan, Piotr Konieczny, Niklas Laxström, Tilman Bayer and James Heilman

Cross-language study of conflict on Wikipedia

Have you wondered about differences in the articles on Crimea in the Russian, Ukrainian, and English versions of Wikipedia? A newly published article entitled “Lost in Translation: Contexts, Computing, Disputing on Wikipedia”[1] doesn’t address Crimea, but nonetheless offers insight into the editing of contentious articles in multiple language editions through a heavy qualitative examination of Wikipedia articles about Kosovo in the Serbian, Croatian, and English editions.

The authors, Pasko Bilic and Luka Bulian from the University of Zagreb, found the main drivers of conflict and consensus were different group identities in relation to the topic (Kosovo) and to Wikipedia in general. Happily, the authors found the dominant identity among users in all three editions was the “encyclopedic identity,” which closely mirrored the rules and policies of Wikipedia (e.g., NPOV) even if the users didn’t cite such policies explicitly. (This echoes the result of a similar study regarding political identities of US editors, see previous coverage: “Being Wikipedian is more important than the political affiliation“.) Other identities were based largely on language and territorial identity. These identities, however, did not sort cleanly into the different language editions: “language and territory [did] not produce coherent and homogeneous wiki communities in any of the language editions.”

The English Wikipedia was seen by many users as providing greater visibility and thus “seem[ed] to offer a forum for both Pro-Serbian and Pro-Albanian viewpoints making it difficult to negotiate a middle path between all of the existing identities and viewpoints.” The Arbitration Committee, present in the English edition but not in the Serbian or Croatian editions, may have helped prevent even greater conflict. Enforcement of its decisions seemed generally to lead to greater caution in the edition process.

In line with previous work showing some users move between language editions, the authors found a significant amount of coordination work between the language editions. One central focus centered around whether other editions would follow the English edition in breaking the article into two separate articles (Republic of Kosovo and Autonomous Province of Kosovo and Metohija).

The social construction of knowledge on English Wikipedia

review by Kim Osman

Another paper by Bilic, published in New Media & Society[2] looks at the logic behind networked societies and the myth perpetuated by media institutions that there is a center of the social world (as opposed to distributed nodes). The paper goes on to investigate the social processes that contribute to the creation of “mediated centers”, by analyzing the talk pages of English Wikipedia’s In The News (ITN) section.

Undertaking an ethnographic content analysis of ITN talk pages from 2004–2012, Bilic found three issues that were disputed among Wikipedians in their efforts to construct a necessarily temporal section of the encyclopedia. First, that editors differentiate between mass media and Wikipedia as a digital encyclopedia, however what constitutes the border between the two is often contested. Second, there was debate between inclusionists and deletionists regarding the criteria for stories making the ITN section. Third, conflict and discussion occurred regarding English Wikipedia’s relevance to a global audience.

The paper provides a good insight into how editors construct the ITN section and how it is positioned on the “thin line between mass media agenda and digital encyclopedia.” It would be interesting to see further research on the tensions between the Wikipedia policies mentioned in the paper (e.g. WP:NOTNEWS, NPOV) and mainstream media trends in light of other studies about Wikipedia’s approach to breaking news coverage.

User hierarchy map: Building Wikipedia’s Org Chart

Read the rest of this entry »

A junior developer discovers MediaWiki

This post is a discovery report written by Coder55 and slightly edited for publication. It’s part of a series of candid essays written by Google Code-in students, outlining their first steps as members of the Wikimedia technical community. You can write your own.


I’m a 17-year-old boy from Germany interested in computer science. I write my own little programs in PHP, Python and Java and have even produced some Android apps. I completed a Python course in three days, and now I’m using Python to solve math problems. I heard about Google Code-in on a German news site for young people interested in computer science.

Account creation and language selection

The instructions for Google Code-in students were easy to understand, even for people who aren’t so good in English. After that, I created an account on mediawiki.org. The registration form looked modern; I wanted to take the user name ‘Coder55’, but it was already taken so an account creation error was displayed. The text I typed in for password and email were deleted after the error; maybe it could be saved in a session variable and written into the text fields via JavaScript.

After registering and logging in, I saw many different options in the top line. It was easy to change the language and to read my welcome message. Maybe the button with the text ‘log out’ could be replaced by a logout button with a little picture, to make the top line smaller and even easier to understand.

After that, I changed the language to German and Spanish because I wanted to see how much of the site had been translated. I was quite disappointed that only the top menu was completely translated. The left sidebar was not completely translated, even though many important links can be found there, like one to the Main Page. I was also surprised that the language of the content on the page didn’t change after I changed my language options: if I’m on the Main Page and I change the language to German, I still see the Main Page in English, although the left menu has partially changed to German. This puzzled me until I found out I had to click on ‘Hauptseite’, ‘Página principal’ etc. to see the Main Page in another language.

How to become a MediaWiki hacker

I am really interested in Developing, so the next thing I did was visiting the How to become a MediaWiki hacker page, where I found interesting tutorials that explained how to develop something on the MediaWiki platform. The page was clearly arranged and I really liked it. It clearly separated the required abilities (PHP, MySQL e.g.) and made it easy to see where I needed to learn something and where I already knew enough. The ‘Get started’ part was particularly helpful: I could start quite fast extending MediaWiki.

One thing that was missing for someone like me: example code of a really easy extension. Although all the aspects of developing are explained in detail in the Developing manual, seeing those easy extensions requires to follow several links; it would be really helpful for beginners to include and explain one or two of these examples in the manual.

I had already been programming some little programs in PHP (chat server, forum etc.) so the next thing I did was to study MediaWiki’s coding conventions; they were explained clearly and were easy to understand. The ‘C borrowings’ part was really interesting.

Around MediaWiki: API, bugzilla, git and Wikitech

Unfortunately, I didn’t find the video on the API page very helpful. The pictures were blinking and the voices hard to understand. But the rest of the API documentation was really informative and easy enough to understand.

After that, I looked at the “Development” section of the left sidebar. I visited the Bugzilla overview, and then the actual site. I really liked the idea of Bugzilla, where every developer can see where help is needed. However, if you don’t know specifically, what to look for, the search function in Bugzilla isn’t very helpful[Note 1].

I then clicked on the link called ‘browse repository’. I was positively surprised by what the site looked like. I especially liked the possibility to see which parts of MediaWiki had been just updated. I also took a look at at Wikitech; The Main Page looked really similar to Wikipedia and MediaWiki, so it seemed easy to navigate.

The Pre-Commit Checklist

On the next day, I read about how to install and configure MediaWiki. The documentation was clear and easy to read, but I didn’t understand all of it, probably because I’m more interested in developing than in hosting.

Following this, I looked into more details about developing at the Developer hub. I had already studied the coding conventions, so I started reading the Pre-Commit Checklist.

This checklist contained many questions, but for someone like me who hasn’t already uploaded code there, they are partially not understandable. The part about Testing was clearer for me because it was explained a little bit more. Maybe the questions in the checklist should be written in a little more detail, or some of the difficult words should be converted into links.

I liked having an overview over all conventions at the bottom of the page. I could easily navigate to another convention list, like the coding conventions for JavaScript. These conventions were explained in detail and with clear examples. I especially liked the part about whitespace where many rules have been written clearly and concisely.

In conclusion

MediaWiki is a very interesting platform and although some things are not perfect (e.g. translation or registering form), it is easy to join the community. The most active contributors are accessible on IRC, which makes communication easier. After discovering the technical world of MediaWiki, I’m really interested in getting involved into the community, although that will need to wait until I finish school.

Coder55
2013 Google Code-in student


  1. Editor’s note: Bugzilla has since been upgraded, and its main page now features common search queries.

Read in this series:

Through the maze of newcomer developer documentation

This post is a discovery report written by David Wood and slightly edited for publication. It’s part of a series of candid essays written by Google Code-in students, outlining their first steps as members of the Wikimedia technical community. You can write your own.


This discovery essay touches on my general thoughts as I initially started to browse and look into developing for MediaWiki.

I’ve split it into three sections: Setting up, where I cover my experiences while working with pywikibot for a previous Google Code-In task; First Impressions, where I describe my thoughts as I browse the documentation geared towards newcomers; and Developer Hub, where I describe my thoughts as I venture into the actual development articles.

Setting up

Before looking to develop for MediaWiki, I had previously completed a task relating to pywikibot. I found that the mentors were very helpful and available for advice.

However, I did find issue with setting up the development environment for work on pywikibot. It seemed very complex, and at first I did not fully understand what I was required to do. For someone who hasn’t worked on large-scale projects before, such as MediaWiki, I was confused as to why it was required to have so many accounts and things set up beforehand, such as a Gerrit account.

Although I now understand, I feel that a guide explaining to newcomers, not necessarily new to collaboration with git, but to using less known tools such as Gerrit, why they are necessary, would be helpful. And although I understand that in some cases not much can be simplified as setup is complicated, perhaps a more in-depth guide would help as well (keep in mind, this is referring to the guide for installing pywikibot, and the guide for MediaWiki in general may be better).

First impressions: a guided tour for newcomers

Coming from only basic experience with a small project within MediaWiki, I was pleasantly surprised with the quality of the information and simplicity of it from the How to become a MediaWiki hacker page. There was a lot of information, for example, instead of simply telling the reader that they required knowledge in PHP, MySQL and JavaScript, the guide went on to link them to where they could gain such knowledge.

From there, I went to read A brief history and summary of MediaWiki. This was especially interesting as not only was it a engrossing read, it also helped the user understand the principles under which MediaWiki is developed, such as the fine line between performance and functionality. Such information helps a new user, such as myself, understand the goals behind MediaWiki and the mindset in which I should be working.

Another pleasant surprise was that even the more technical articles, such as Security for Developers, were written in plain English, without a lot of technical language. And even where it got technical, it was explained well. Guides that have a lot of importance, such as this one relating to Security, benefit even more from being simple for newcomers than most, as it’s more likely then that a newcomer would understand and implement what they’ve learnt.

Another note I made was that all the links that would be relevant to newcomers, such as Coding Conventions, were all easily found.

Developer Hub: What next?

My next stop was the Developer Hub, where I found that I wasn’t sure how exactly to proceed. There, unlike the last article, didn’t seem to be a clear path to follow when traversing the article, most likely due to this article not being geared directly towards newcomers to MediaWiki.

This is where I experienced most issue; from here there was no more crutch helping me along. I somewhat expected, as I had seen on other projects, there to be a simple guide for what to do next for newcomers and, unless I couldn’t see it, there wasn’t one. I was left unsure as to what to do next; Should I just start browsing code? Look at feature requests? Or for bugs? I think this is where some guidance would be helpful for newcomers; getting to this point was well documented, but after you’ve set everything up, you’re left wondering what next. Some sort of list of easy bugs, or projects for newcomers to contribute to, would give some guidance as to what type of contributions they should be looking to make next.

I also noted that some information linked from the developer hub, such as an archived roadmap, was out-of-date or marked as only available for historical purposes. While I understand the reasoning behind this, it’s still confusing as these links are still prominent on the page.

In conclusion

While I didn’t install MediaWiki personally, my experiences toward the complexity of setting up things, as detailed in the first part, where from a pywikipediabot perspective, as I come from a python background rather than a PHP background. I would consider however contributing to MediaWiki in the future, if I ever take time to learn PHP, as it not only seems a enjoyable experience, but I appreciate the ideologies behind MediaWiki, to support a community that creates and curates freely reusable knowledge on an open platform.

David Wood
2013 Google Code-in student


Read in this series: