Wikimedia blog

News from the Wikimedia Foundation and about the Wikimedia movement

Posts Tagged ‘input methods’

Language Engineering Events – Language Summit, Fall 2013

The Wikimedia Language Engineering team, along with Red Hat, organised the Fall edition of the Open Source Language Summit in Pune, India on November 18 and 19, 2013.

Members from the Language Engineering, Mobile, VisualEditor, and Design teams of the Wikimedia Foundation joined participants from Red Hat, Google, Adobe, Microsoft Research, Indic language projects, Open Source Projects (Fedora, Debian) and Wikipedians from various Indian languages. Google Summer of Code interns for Wikimedia Language projects were also present. The 2-day event was organised as work-sessions, focussed on fonts, input tools, content translation and language support on desktop, web and mobile platforms.

Participants at the Open Source Language Summit, Pune India

The Fontbook project, started during the Language Summit earlier this year, was marked to be extended to 8 more Indian languages. The project aims to create a technical specification for Indic fonts based upon the Open Type v 1.6 specifications. Pravin Satpute and Sneha Kore of Red Hat presented their work for the next version of the Lohit font-family based upon the same specification, using Harfbuzz-ng. It is expected that this effort will complement the expected accomplishment of the Fontbook project.

The other font sessions included a walkthrough of the Autonym font created by Santhosh Thottingal, a Q&A session by Behdad Esfahbod about the state of Indic font rendering through Harfbuzz-ng, and a session to package webfonts for Debian and Fedora for native support. Learn more about the font sessions.

Improving the input tools for multilingual input on the VisualEditor was extensively discussed. David Chan walked through the event logger system built for capturing IME input events, which is being used as an automated IME testing framework available at to build a library of similar events across IMEs, OSs and languages.

Santhosh Thottingal stepped through several tough use cases of handling multilingual input, to support the VisualEditor’s inherent need to provide non-native support for handling language content blocks within the contentEditable surface. Wikipedians from various Indic languages also provided their inputs. On-screen keyboards, mobile input methods like LiteratIM and predictive typing methods like ibus-typing-booster (available for Fedora) were also discussed. Read more about the input method sessions.

The Language Coverage Matrix Dashboard that displays language support status for all languages in Wikimedia projects was showcased. The Fedora Internationalization team, who currently provides resources for fewer languages than the Wikimedia projects, will identify the gap using the LCMD data and assess the resources that can be leveraged for enhancing the support on Desktops. Dr. Kalika Bali from Microsoft Research Labs presented on leveraging content translation platforms for Indian languages and highlighted that for Indic languages MT could be improved significantly by using web-scale content like Wikipedia.

Learn more about the sessions, accomplishments and next steps for these projects from the Event Report.

Runa Bhattacharjee, Outreach and QA coordinator, Language Engineering, Wikimedia Foundation

Report from the Spring 2013 Open Source Language Summit

Fortuna i forti aiuta, e i timidi rifiuta — an Italian proverb

The Wikimedia Foundation and Red Hat jointly organized the Second Open Source Language Summit on February 12th and 13th, 2013. The summit was held at the Red Hat engineering center in Pune, India. Similar to the previous summit, this face-to-face work session was focused on internationalization (i18n) and localization (l10n) features, font support, input method tools, language search, i18n testing methods and standards. The sessions were work sprints, each with special focus on a key area. Participants included core contributors from the Wikimedia Foundation, Red Hat (including Fedora SIG members), KDE, FUEL, Google and C-DAC. Below is a summary of what was accomplished during these two days.

During the summit, teams from different organizations came together to discuss language-related challenges, and worked together on features and tools to address them.

During the summit, teams from different organizations came together to discuss language-related challenges, and worked together on features and tools to address them.

Input Methods

Parag Nemade and Santhosh Thottingal worked on making additional input methods available for the jQuery.IME library. 60 input methods, covering languages like Assamese, Esperanto, Russian, Greek, Hebrew were added bringing the total to 144. Also IMEs from the m17n library missing from the jQuery.IME library were identified.

Translation tools, & FUEL Sprint

Siebrand Mazeland and Niklas Laxström, together with Ankit Patel, Rajesh Ranjan and Red Hat language maintainers, worked to identify more tools that could be used as Translation aids in a translation system. The FUEL project aims to standardize translations for frequently used terms, translation style and assessment methodology. Until now it has focused mostly on languages of India. The FUEL project can now be translated in Pau Giner demonstrated new designs for the translation editor and terminology usage, remotely from Spain.

Language Coverage Matrix

To better evaluate the needs for enabling support for languages, a matrix detailing the requirements and availability of basic and extended features is being drawn up. With 285 languages currently supported in Wikimedia and more than 100 in Fedora, this document will be instrumental in bridging the gaps and porting features across projects and platforms. Key areas of evaluation include input methods, fonts, translation aids like glossaries and spell-checkers, testing and validation methods, etc. A preliminary draft was created during the summit by Alolita Sharma, Runa Bhattacharjee and Amir E. Aharoni.

Fonts, WebFonts

An initiative to document the technical aspects of fonts for scripts for languages spoken in India started during the language summit. For each of the scripts, a reference font will be chosen and each font will be explained in detail to intersect with the Open Type font specification as a standard. It will aim to act as a reference document for any typographer working on Indian language fonts. Initial draft and outline of this document was prepared during the second day of the language summit, mainly by Santhosh Thottingal and Pravin Satpute.

Testing Internationalization Tools

Finding suitable methods for testing internationalized components and contents was the major focus of this sprint, with the Fedora Localization Testing Group (FLTG) and Wikimedia’s Language Engineering team sharing details of their testing methods. The FLTG conducts Test Days prior to Fedora beta releases with a test matrix targeted at specific core components, and Wikimedia uses unit tests for frequent testing of their development features. The FLTG showed its plans to integrate the screenshot comparison method for testing localized interfaces. This method will be useful for Wikimedia too. Extending the method for web-based applications and Wikimedia’s language requirements (e.g. right-to-left) were identified as areas for collaboration.

More news from the Language Summit can be found in the tweets, the session notes and the full report.

Runa Bhattacharjee, Outreach and QA coordinator, Language Engineering

OpenSource Language Summit

The Wikimedia Foundation and Red Hat co-organized an Open Source Language Summit in Pune, India on November 6-7, 2012. The summit focused on language tools and technology development to support languages on Wikipedia, the Web, Linux and other Open Source platforms.

Santhosh Thottingal presenting his talk on jquery.ime

In total, 45 core language technology developers, open source contributors, typographers and technology evangelists from the Wikimedia Language Engineering and Mobile teams, Red Hat, Mozilla Foundation, KDE, GNOME, and other open source projects participated in sessions and work sprints on internationalization and localization features supporting various open source projects on the web and Linux. After brief introductory talks, we focused our work on font support, input method tools, language search, and web and localisation standards.


The event had short talks on the following topics:

Selected achievements

The following people won prizes for their code contributions during the event:

  • Anish Patil ported Universal Language Selector’s cross-language search algorithm to gnome language search
  • Aravinda VK wrote a set of font-forge python wrappers to make changes to fonts programmatically. Aravinda fixed a few bugs in Kannada Gubbi font for Harfbuzz rendering engine and also wrote Kannada KGP keymap for jquery.ime
  • G Karunakar added Hindi inscript keyboard layout to Firefox OS GAIA

Other accomplishments included:

  • Kushal Das added patches to deploy Universal Language Selector on and also a patch for a bug on Mozilla localization platform.
  • Alolita, Sankarshan, Runa, Satish worked on discussing APIs for various translation workflows and putting together an initial specification.
  • Rajeesh Nambiar, Hussain KH, Ani Peter, Praveen A and Pravin Satpute fixed and filed upstream bugs for Malayalam, Kannada, Gujarati and Punjabi fonts with Harfbuzz.
  • Parag Nemade added InScript2 keyboards for Sanskrit, Nepali, Marathi and Konkani to jquery.ime.
  • Ankit Gadgil wrote over 200 unit tests for Marathi and Hindi input methods in jquery.ime.
  • Yuvaraj Pandian, Pau Giner, Arun Ganesh and Siebrand Mazeland developed an initial version of an Android-native app for for translation reviews.
  • Pau Giner conducted user testing with new translation prototypes with translators. Arun Ganesh created an icon for gnome-transliteration.

You can browse through tweets and more notes from the event. Happy reading!

Srikanth Lakshmanan
Internationalisation/Localisation Outreach / QA Engineer

Universal Language Selector now has Input Methods

The Language Engineering team at the Wikimedia Foundation works on a set of tasks every two weeks. This post is about the team’s accomplishment over the past two weeks. 

Have you ever sat at a computer in a foreign country, and wondered how you were going to enter text in your language using a keyboard with a different alphabet?

“Input methods” are interfaces that allow users to enter text in a script different from the one used on their keyboard. On some Wikipedia versions (like wikis in Indic languages), such a tool has been available through the Narayam extension.

As part of Project Milkshake, this feature has recently been exported to a JavaScript library (a bundle of code, called jquery.ime) so that it could be reused by other web developers.

Another language-related tool, the Universal Language Selector (ULS), allows readers of Wikipedia and its sister sites to easily pick the language of their choice for the website’s interface.

Over the last two weeks, we’ve integrated the input methods’ functionality directly into the Universal Language Selector: it now comes with a large set of input tools that users can use to input text in non-latin languages.

The integration of the two tools makes the interface more consistent and usable when it comes to choosing languages in which to read (“display”) and to write (“input”) on the site: both settings are located in the same dialog of the Universal Language Selector.

When selecting a language in which to write, it’s possible to set an accompanying preferred input method for that language, if available. When input methods have been assigned to different writing languages, switching between languages in the menu will automatically change to the preferred input method for that language.

Other language engineering news in brief:

  • The Language Engineering team will be in India during the second week of November to participate in the OpenSource Language Summit in Pune, and the Wikimedia DevCamp in Bangalore. For new volunteers who want to get started contributing to our tools, we’ve prepared a list of bugs that you can work on at these events with our support.
  • We’ve also worked on finalizing the development plan and features for Translate UX improvements, which were identified by user testing with volunteer translators to improve translation efficiency.
  • We’ve worked on how to get metrics on the impact of our tools through URL-based usage data gathering. Feedback is welcome.
  • We’ve fixed some bugs related to the ULS and gender support in MediaWiki and MediaWiki extensions.
  • The Narayam and Webfonts extensions were deployed to Wikimedia sites in Marathi; Narayam was also deployed to sites in Amharic.
  • An early stable version of ULS was deployed on Wikidata; this first use on a production site revealed a few bugs that were fixed. It will be updated to the latest stable version periodically.

Srikanth Lakshmanan
Internationalisation/Localisation Outreach / QA Engineer