Wikimedia blog

News from the Wikimedia Foundation and about the Wikimedia movement

WikiBhasha

Folks over at Microsoft Research have been thinking about ways to improve content translation between instances of Wikipedia.  For example, today the largest collection of articles is at English Wikipedia (more than 3,000,000).  Compare that number with the collection at Hindi Wikipedia (which as of July 31 of this year had 55716).  One proven way to increase the articles in Hindi is machine translation, but such translations still need human review and often subtle editing to make them elegantly readable.

Enter WikiBhasha, formerly known as WikiBABEL, which launches today as both a MediaWiki extension project and a bookmarklet.  WikiBhasha takes content from a targeted Wikipedia page and displays a machine translation to a second language side-by-side.  Users can edit, add to or delete the translated content, preview their work and then submit it to the second language Wikipedia.

What’s especially interesting to me about this project is the fact that its author, researcher A. Kumaran, has tirelessly persuaded Microsoft to allow him to open source the client.  The code has been checked into the MediaWiki code tree under the Apache License 2.0, which means that the powerful side-by-side editing tools developed by Mr. Kumaran can potentially be used in other MediaWiki projects.  I’m very pleased to see Microsoft take this step, and I hope you will join me in welcoming WikiBhasha.

Danese Cooper, Chief Technical Officer

16 Responses to “WikiBhasha”

  1. Joe G. says:

    A translating machine will never work as well as a human. Anyone who thinks it will probably is monolingual or they have never studied linguistics. Edith Grossman is helpful here, so I will copy and paste something she said (from the Wikipedia entry about her):

    “Fidelity is surely our highest aim, but a translation is not made with tracing paper. It is an act of critical interpretation. Let me insist on the obvious: Languages trail immense, individual histories behind them, and no two languages, with all their accretions of tradition and culture, ever dovetail perfectly. They can be linked by translation, as a photograph can link movement and stasis, but it is disingenuous to assume that either translation or photography, or acting for that matter, are representational in any narrow sense of the term. Fidelity is our noble purpose, but it does not have much, if anything, to do with what is called literal meaning. A translation can be faithful to tone and intention, to meaning. It can rarely be faithful to words or syntax, for these are peculiar to specific languages and are not transferable.”

  2. Vivek Nirkhe says:

    @Nil Einne: What Microsoft has released under Apache or GPLv2 is the client code. Client code, either as a bookmarklet or a Mediawiki extension, talks to their translation service or their CTF service. While it is good that Microsoft has released this code, I really don’t know how useful it is for people to have the code that talks to their backend service. The translation or the collaboration takes place on the server side, where the real meat is, is not released.

    It may be an attempt to mislead people in thinking that they are contributing to an open service. They are definitely getting people to contribute samples, for free, towards their commercial translation products.

  3. Nil Einne says:

    Hmm interesting. It means of course Microsoft are competing with Google in this area [http://googleblog.blogspot.com/2010/07/translating-wikipedia.html] except I don’t think Google agreed to release any of their code. Will they now?

  4. Vivek Nirkhe says:

    @Taskado, Microsoft is doing this for two reasons: 1. to help increase the content in other languages. 2. Capture user corrections to their machine translated text and improve their commercial translation products. They are getting free samples and testing done for their products.

    The commercial concerns are a huge part of this contribution.

  5. Fantastic -Its flagship project, Wikipedia, ranks the best in world to develop society at large.

  6. Osama Khalid says:

    The JavaScript they added is “Copyright (C) Microsoft. All rights reserved.” without any further permission. I wonder if the person who added that JavaScript code to Wikipedia has the right to release it under a free license.

    http://en.wikipedia.org/w/index.php?title=User:WikiBhasha.MSR/WikiBhasha.js

  7. Taskado says:

    Hats off to Microsoft for putting aside commercial concerns for once and doing somethimg positive for society.

  8. Casey Brown says:

    @Truewell (#6): Fixed, thanks for the heads-up! Everyone makes typos every now and then, your comment had one too, after all! ;-) (“Come one!”)

  9. Truewell says:

    Grammatical error: “…the fact that it’s author…”

    * it’s = contraction of “it is”
    * its = possessive term

    Thus, correctly stated:
    “…the fact that its author…”

    Come one! The Wikipedia Chief Technical Officer’s message shouldn’t contain errors in basic grammar!

  10. Vivek Nirkhe says:

    Danese, thank you very much for looking into this matter.

    The way I understood this, from one of the WikiBhasha team members, Microsoft plans to use user translations to improve its translation engine furthering their commercial product. This seems to be a blatant use of Wikipedia content towards commercial gains. User generated translation database needs to be kept away from Microsoft’s product use if not Microsoft’s servers.

  11. Danese says:

    Note: It came to our notice after the launch yesterday that there is a Terms of Use (ToU) http://www.wikibhasha.com/wikibhasha/terms.htm that users must agree to when using WikiBhasha, which includes terms at odds with the open source nature of the client software. Wikimedia Foundation was not shown this ToU document before the launch, or we would have pointed out how its ambiguities would likely retard community interest in the project. We have pointed out the problem to A. Kumaran, who is now seeking to rework the ToU to clarify that it applies only to Microsoft’s machine translation engine.

  12. Excellent tool indeed! Very intuitive and efficient. It only takes English as a source language, though.

  13. Vivek Nirkhe says:

    Hi, it is good to know that there are now tools that will help increase the content coverage in other languages. This will be incredible.

    I applaud Microsoft for making the client open source. However, I see that WikiBhasha is a Microsoft service with users signing a Microsoft’s TOU. Who owns the user contributed content – is it Microsoft or WikiPedia? On the other hand, Microsoft WikiBhasha team member confirmed that translation database will be used to improve Microsoft’s commercial products.

    Could you please clarify these issues? Thank you.