An international team of volunteers has translated the MediaWiki software used by Wikipedia into more than 100 languages. This is a critical precondition to enable participation in Wikimedia projects from all parts of the world. Today, the work of translating the wiki software is done through a wiki: BetaWiki , which is not operated by the Wikimedia Foundation.
Free Culture Spotlight, a new blog feature that will focus on free culture and open source efforts external to the Wikimedia projects, takes a look at this extraordinary effort and the people behind it.
Wikipedias exist in more than 250 languages. From the very early beginnings of Wikipedia, the project was conceived to be multilingual. In March 2001, Wikipedia founder Jimmy Wales announced the first non-English Wikipedias. In his announcement, Jimmy wrote:
One problem is going to be technical support of these languages, since if there
are “fancy letter” problems, I will not know much how to deal with them. Japanese
is pretty much all “fancy letters”, but I assume that Linux/Apache/Perl will just
magically support it? Or will they be forced to use non-fancy ASCII urls?
Indeed, supporting content in other languages well is a very hard problem. Fortunately, increased standardization and awareness of internationalization problems has made it a little easier to at least deliver content to the end user. Try loading the Hebrew, Russian, Japanese or Hindi Wikipedia: On most modern systems, you should see the correct character sets. Note that it’s not just the content of the encyclopedia that is in a different language: The entire user interface is localized, and in right-to-left languages like Hebrew, even the navigation is optimized.
The software sending you these Wikipedia pages is called MediaWiki. Just like Wikipedia, MediaWiki is open source: You can download it, change it, and use it freely. Thanks in significant part to a volunteer community called BetaWiki, MediaWiki today has core support for more than 100 languages, making it one of the most translated software projects ever.
BetaWiki is not run by the Wikimedia Foundation. It was started by a young Wikipedian named Niklas Laxström (“Nikerabbit”), and makes it possible for people who are not software developers to participate in the work of translating and adapting user interface messages. BetaWiki is not just used to translate MediaWiki itself, but also about 200 MediaWiki extensions that add functionality to the software. It’s not uncommon that a software developer writes a new extension for MediaWiki, only to be surprised to find it translated into a dozen new languages within days by the ever-active BetaWiki community.
To support participation in languages spoken in Africa, Asia and Latin America, BetaWiki recently launched a bounty project to give people small financial rewards for translating key user interface messages into those languages. The project has also not limited itself to MediaWiki: FreeCol, an open source strategy game, is being localized through BetaWiki.
We exchanged e-mails with Niklas Laxström about the story of the project, its community, and its future goals.
What motivated you to create Betawiki?
It started as a hobby project of mine. I added a few features to MediaWiki’s translate extensions for my own use. Soon we persuaded a few translators to use it, as it was also easier for me to export from it and commit into MediaWiki’s version control system.
Who is working with you?
About half year ago Siebrand Mazeland started to help me in maintenance and administration like tasks. He has now also took over committing translations to [the version control system], leaving more time for me to study and improve the translation platform itself. Other members of our staff are SPQRobin and Jon Harald Søby, who do a lot of janitorial and community work. I’d also like to mention Malafaya and Gerard Meijssen, who have done a lot of outreach as Betawiki ambassadors by posting a note about the MediaWiki localisation project on all sister projects of Wikimedia. Although not really active any more, Gangleri has played a key role in the early days of Betawiki.
What are the most important milestones in the history of the project?
The development has been pretty steady and continuous, with no big leaps, even though the code has been rewritten a few times. There are some points I’d like to mention. The discussion to move Betawiki under Wikimedia Foundation and its servers resulted–if nothing else–in a rewrite of the Betawiki functionality as a stand alone extension.Also the move to a new fast server was very important as the wiki was becoming too slow to use. The new server is fast and gives us plenty of room to grow even bigger. Along with the new server we moved to a new domain, which has greatly enhanced our visibility and credibility. Thanks go to Siebrand for getting us both the server and the domain.
What were the biggest challenges?
Biggest challenge was probably to gain significant mass to get the ball rolling. The past two years or so we have had constant growth in the number of languages and translators.
Have you had any issues with vandalism and spam on Betawiki?
I think we had some spam last year. We installed [a] captcha
extension and configured it to catch adding of links by anonymous and regular expression that detects some broken UTF-8. The latter stopped a spam bot that just added random crap and broke UTF-8 text. Since then we haven’t had any problems with spam. Vandals we have never had, and the translations are protected by [the] informal ‘I want to be a translator’ request.
Recently Betawiki started advertising bounties for language localization. In your experience so far, does this approach work to motivate translators?
The bounties have been targeted specifically to the languages mainly in Africa, Asia, and Latin America. We have had 13 users working on 14 languages. So far they have translated about 15,000 messages for which they received a bounty. The most complete results are those for Hindi, Malayalam, Marathi, Telugu, and Tadjik. We see that people make a commitment and work at least until that commitment has been fulfilled, and usually more. Our goal has been reached if the user will keep on maintaining the localisation even if there is no longer a monetary incentive. We already see that this is not always the case, and we did not expect it to be. We are happy already that someone has worked to a certain milestone with dedication. Like in Wikipedia, every edit has its value. The bounties in the language project will end in a month or two, because we need to report back to the sponsor of the project, the Dutch NGO Hivos
. We are grateful to Stichting Open Progress
for getting the funds in, and to Hivos for making the project possible.
I’m sure some languages do a lot better than expected because there are enthusiasts who support them – but are there languages that do remarkably poorly, in spite of a large number of users and translators?
Until recently, support for Hindi was poor, despite it having more than 300 million speakers. In the past month however, a translator named Kaustubh
became active, and he single-handedly brought both Hindi and Marathi into the list of the 20 languages that have the most complete localisation of MediaWiki. This is a prime example of how one user can make a huge difference for a language community using MediaWiki, and consequently multiple projects within Wikimedia.
Other languages for which one would expect a pretty complete localisation are Spanish and the various languages spoken in China. For Spanish, only 90% of the core messages are available, and support for extensions used within Wikimedia is poor. For Simplified Chinese, Traditional Chinese and Yue (Cantonese) support is quite complete for core messages and extensions used in Wikimedia, but the rest of the extensions have poor support. Localisations are usually more complete and better maintained if they do not lean on one user. A team of two to three users regularly working on a localisation gives stability and continuity, although there are exceptions to that rule.
While Wikimedia in theory “supports” 250 languages, do you think the organization does a good enough job communicating to people in those languages? If not, what could be improved?
This is an important question, and I am struggling with the answers. It definitely has a resource component: how do you ensure your message is understood if you are not able to communicate to communities in their native language? We try to be part of the solution by providing user interfaces in as many languages as possible for MediaWiki, which hopefully lowers the barrier to entry for users that find a Wikimedia project in any particular language. What happens after that, is mainly up to the communities that form in a wiki for a certain language/topic combination.
We also have planned an extension of the Translate extension, to ease the translation of wiki pages to other languages, much like the translation of the Betawiki main page at this moment, but without the need of making configuration changes to the wiki environment. This could be a killer feature for multi-language wikis like Wikimedia’s Meta, and Commons, but also for documentation wikis of other open source projects like Mozilla.
Beyond MediaWiki and its extensions, Betawiki is also used to localize the Freecol open source game. How did this happen, and do you think the wiki will be used for other projects in the future?
Freecol too was added by my personal interest. FreeCol uses a translation format that was easy to support. I added it to Betawiki for my own use. In July 2007 FreeCol’s i18n coordinator Michael Burschik asked all FreeCol translators to test Betawiki, and since then [a] good number of FreeCol translations are maintained in Betawiki.Support for other projects is one of the tasks in my Summer Job project
. I will be working on Gettext file format support, which will open doors to many open source projects.
What is your impression of the state of localization in open source software and closed source software projects?
Considering we don’t really have the money to compete in localisation with closed source software in the less resourced languages, we are doing pretty well. The biggest problem in open source localisation is the lack of use and existence of proper tools. Localisation is mainly done alone and without translation memories or term banks.Open source localisation has the potential to do better together. To be more efficient we have to be equally or better equipped as commercial translators and the localisation process itself has to be more efficient. In Betawiki we are currently investing in the latter. We make the process for translators as fast and easy and possible and let them work together documenting the messages and reviewing the translators to produce high quality translation fast and with less effort.
Not all closed source software have high quality translations either. Especially the smaller ones can have mediocre or even poor translations.
What’s next for you personally? Do you have any projects plans, either within our outside Betawiki?
I aim to finish my studies–it will take few years still. Of course I will keep improving Betawiki on my free time and especially in the summer when I have my summer job. I will probably also move away from my very small dormitory at some point :)
In the spirit of free culture collaboration, if you find that a language you speak is not well supported by MediaWiki, why not become a BetaWiki user? Your work can enable collaboration by volunteers on the of thousands of websites that use MediaWiki, including the Wikipedia in your language.
A post about localization raises the question: Why aren’t these posts translated into other languages? If you are comfortable editing a wiki, please help us translate our blog posts. :-)
Erik Moeller, Deputy Director