On September 5, 2014, the Language Engineering team hosted an online round-table with editors of the Catalan Wikipedia to discuss the Content Translation tool. Besides the translation editor and tools, the first release of Content Translation supported machine translation from Spanish to Catalan. This helped the editors work efficiently and explore the tool more deeply.

The initial feedback from editors was greatly encouraging. They liked the tool and were pleased by the tool’s ease of use. After a month of extensive use during which 160+ articles had been created and contributed to the Catalan Wikipedia, the team wanted to find out more about how the tool was being used on day-to-day editing workflows by the editors as well as gaps that the tool leaves unaddressed. The conversation resulted in valuable feedback from the editors, some of which has been presented below.

Screenshot of the Content Translation tool that shows the user a warning about a large amount of machine translated content in the translated article.

(Content-Translation-Warning.png, includes text from en:Tree, by Wikipedia contributors, under CC-BY-SA 3.0, and es:Árbol, by Wikipedia contributors, under CC-BY-SA 3.0)

Faster Editing: The editors unequivocally agreed that the tool provided an overall improvement in their workflow. They were able to create new articles faster and the high-quality machine translated drafts often needed very few corrections. Editor Xavi Bosch felt that he could create articles in approximately 30% of the time he originally needed before the tool was available. With the extra time gained, the editors could focus on fine-tuning the article. For instance, by adding more references.

Machine Translation: Content Translation uses Apertium as the machine translation engine. The editors expressed their satisfaction at the overall quality of translation provided by the tool. However, they suggested adding more checks that would identify articles which were largely unchanged. Presently, the user is warned when the tool detects when not much has been changed from the original translation. Pau Giner suggested exploring community best practices from the Catalan Wikipedia to create additional baselines for articles published using Content Translation.

Category Adaptation: After creating an article, the current setup on beta labs requires users to publish the article manually on the Catalan Wikipedia. This allows the editors to review the articles and prepare them for publication. The editors highlighted that categories are a major addition during these reviews and a feature to adapt categories would be a major benefit. Category adaption is a feature planned for development. The editors suggested:

  • inserting the translated equivalents of the categories in the original article, and
  • a feature to add new categories (similar to HotCat)

Article continuity through red links: At present, articles from the source language that are not present in the target language are not marked in the translated text. In wiki pages, these are marked as red links. Editors suggested that a similar indicator should be displayed in the published article. This will be especially helpful when creating closely linked articles like the ones recently created on the Catalan Wikipedia for the Fields Medal awardees.

Complementing the current tools: The Catalan Wikipedia editors also use several tools for typo correction and other aids. It was suggested to explore the possibility of integrating these tools to complement the current services provided through Content Translation. Editor B25es highlighted some long existing minor errors in the Apertium translation service that were being carried into Content Translation as well. The editors recommended extending Content Translation to learn from these known issues and provide corrections that would be beneficial.

Issues while publishing articles : On several occasions the editors had not been able to save a translated article. While some of this was due to the technical instability of the beta labs environment where the tool is currently hosted, the editors found some patterns and content where this error had been recurring. Articles with more visual content or complex templates (like football results) have often been problematic. In a few cases where the article was not saved, it was noticed that the sequence in which the paragraphs had been translated was similar. For instance in articles about Cédric Villani or Stanislav Smirnov. The development team has begun investigating these issues.

To know more, watch the recording of the conversation and read about the features of the upcoming release. If you haven’t tried the tool yet, please do so using these instructions. We would love to hear your feedback.

Runa Bhattacharjee, Outreach and QA coordinator, Language Engineering, Wikimedia Foundation