Many readers of this blog know about the Content Translation initiative. This project, developed by the Language Engineering team of the Wikimedia Foundation, brings together machine translation and rich text editing to provide a quick method to create Wikipedia articles by translating them from another language.
Content Translation uses Apertium as its machine translation back-end. Apertium is a freely licensed open source project and was our first choice for this stage of development. The first version of Content Translation focused on the Spanish-Catalan language pair, and one of the reasons for this choice was the maturity of Apertium’s machine translation for those languages.
However, with growing needs to support more language pairs in the newer versions of Content Translation, it became essential that the machine translation continue to be reliable, and that the back-end be stable and up-to-date. To ensure this stability, we needed to use the latest updates released by the Apertium upstream project maintainers, and we needed to use Apertium as a separate service. Prior to this set-up, the Apertium service was being provided from within the Content Translation server (cxserver).
The Content Translation tool is currently hosted on Wikimedia’s beta servers. To set up the independent Apertium service, it was important to use the latest released stable packages from Apertium, but they were not available for the current versions of Ubuntu and Debian. This became a significant blocker, because use of third party package repositories is not recommended for Wikimedia’s server environments.
After discussion with Wikimedia’s Operations team and Apertium project maintainers, it was decided that the Apertium packages would be built for the Wikimedia repository. In addition to the Apertium base packages, individual packages for supporting the language pairs and other service packages were built, tested and included in the Wikimedia repository. Alexandros Kosiaris (from the Wikimedia Operations team), reviewed and merged these packages and the patches for their inclusion in the repository. The Apertium service was then puppetized for easy configuration and management on the Wikimedia beta cluster.
Meanwhile, to make Apertium more accessible for Ubuntu and Debian users, Kartik Mistry (from the Wikimedia Language Engineering team) also started working closely with the Apertium project maintainers, to make sure that the Debian packages were up-to-date in the main repository. Going forward, once the updated packages are included in Ubuntu’s next Long Term Support (LTS) version, we plan to remove these packages from the internal Wikimedia repository.
The Content Translation tool has since been updated and now supports Catalan, Portuguese and Spanish machine translation, using the updated Apertium service through cxserver. We hope our users will benefit from the faster and more reliable translation experience.
We would like to thank Tino Didriksen, Francis Tyers and Kevin Brubeck Unhammer from the Apertium project, and Alexandros Kosiaris and Antoine Musso from the Wikimedia Operations and Release Engineering teams respectively, for their continued support and guidance.