WikiBhasha

Translate This Post

Folks over at Microsoft Research have been thinking about ways to improve content translation between instances of Wikipedia.  For example, today the largest collection of articles is at English Wikipedia (more than 3,000,000).  Compare that number with the collection at Hindi Wikipedia (which as of July 31 of this year had 55716).  One proven way to increase the articles in Hindi is machine translation, but such translations still need human review and often subtle editing to make them elegantly readable.
Enter WikiBhasha, formerly known as WikiBABEL, which launches today as both a MediaWiki extension project and a bookmarklet.  WikiBhasha takes content from a targeted Wikipedia page and displays a machine translation to a second language side-by-side.  Users can edit, add to or delete the translated content, preview their work and then submit it to the second language Wikipedia.
What’s especially interesting to me about this project is the fact that its author, researcher A. Kumaran, has tirelessly persuaded Microsoft to allow him to open source the client.  The code has been checked into the MediaWiki code tree under the Apache License 2.0, which means that the powerful side-by-side editing tools developed by Mr. Kumaran can potentially be used in other MediaWiki projects.  I’m very pleased to see Microsoft take this step, and I hope you will join me in welcoming WikiBhasha.
Danese Cooper, Chief Technical Officer

Archive notice: This is an archived post from blog.wikimedia.org, which operated under different editorial and content guidelines than Diff.

Can you help us translate this article?

In order for this article to reach as many people as possible we would like your help. Can you translate this article to get the message out?

16 Comments
Inline Feedbacks
View all comments

[…] meddelade Danese Cooper, som är teknisk chef på Wikimedia Foundation, att Microsoft Research center kommit […]

[…] language to another. They’ve developed a tool, and it’s now going open source. From the Wikimedia blog: Enter WikiBhasha, formerly known as WikiBABEL, which launches today as both a MediaWiki extension […]

Hi, it is good to know that there are now tools that will help increase the content coverage in other languages. This will be incredible.
I applaud Microsoft for making the client open source. However, I see that WikiBhasha is a Microsoft service with users signing a Microsoft’s TOU. Who owns the user contributed content – is it Microsoft or WikiPedia? On the other hand, Microsoft WikiBhasha team member confirmed that translation database will be used to improve Microsoft’s commercial products.
Could you please clarify these issues? Thank you.

Excellent tool indeed! Very intuitive and efficient. It only takes English as a source language, though.

Note: It came to our notice after the launch yesterday that there is a Terms of Use (ToU) http://www.wikibhasha.com/wikibhasha/terms.htm that users must agree to when using WikiBhasha, which includes terms at odds with the open source nature of the client software. Wikimedia Foundation was not shown this ToU document before the launch, or we would have pointed out how its ambiguities would likely retard community interest in the project. We have pointed out the problem to A. Kumaran, who is now seeking to rework the ToU to clarify that it applies only to Microsoft’s machine translation engine.

Danese, thank you very much for looking into this matter.
The way I understood this, from one of the WikiBhasha team members, Microsoft plans to use user translations to improve its translation engine furthering their commercial product. This seems to be a blatant use of Wikipedia content towards commercial gains. User generated translation database needs to be kept away from Microsoft’s product use if not Microsoft’s servers.

Grammatical error: “…the fact that it’s author…”
* it’s = contraction of “it is”
* its = possessive term
Thus, correctly stated:
“…the fact that its author…”
Come one! The Wikipedia Chief Technical Officer’s message shouldn’t contain errors in basic grammar!

@Truewell (#6): Fixed, thanks for the heads-up! Everyone makes typos every now and then, your comment had one too, after all! 😉 (“Come one!”)

Hats off to Microsoft for putting aside commercial concerns for once and doing somethimg positive for society.

[…] fundación Wikimedia considera esta iniciativa como una interesante herramienta para hacer crecer el contenido de la […]

The JavaScript they added is “Copyright (C) Microsoft. All rights reserved.” without any further permission. I wonder if the person who added that JavaScript code to Wikipedia has the right to release it under a free license.
http://en.wikipedia.org/w/index.php?title=User:WikiBhasha.MSR/WikiBhasha.js

Fantastic -Its flagship project, Wikipedia, ranks the best in world to develop society at large.

@Taskado, Microsoft is doing this for two reasons: 1. to help increase the content in other languages. 2. Capture user corrections to their machine translated text and improve their commercial translation products. They are getting free samples and testing done for their products.
The commercial concerns are a huge part of this contribution.

Hmm interesting. It means of course Microsoft are competing with Google in this area [http://googleblog.blogspot.com/2010/07/translating-wikipedia.html] except I don’t think Google agreed to release any of their code. Will they now?

@Nil Einne: What Microsoft has released under Apache or GPLv2 is the client code. Client code, either as a bookmarklet or a Mediawiki extension, talks to their translation service or their CTF service. While it is good that Microsoft has released this code, I really don’t know how useful it is for people to have the code that talks to their backend service. The translation or the collaboration takes place on the server side, where the real meat is, is not released. It may be an attempt to mislead people in thinking that they are contributing to an open service.… Read more »

A translating machine will never work as well as a human. Anyone who thinks it will probably is monolingual or they have never studied linguistics. Edith Grossman is helpful here, so I will copy and paste something she said (from the Wikipedia entry about her): “Fidelity is surely our highest aim, but a translation is not made with tracing paper. It is an act of critical interpretation. Let me insist on the obvious: Languages trail immense, individual histories behind them, and no two languages, with all their accretions of tradition and culture, ever dovetail perfectly. They can be linked by… Read more »