Wikimedia blog

News from the Wikimedia Foundation and about the Wikimedia movement

Posts Tagged ‘language support’

After the slush, the flood

after the slush, the flush

When new code does not find its way into production for quite some time, it tends to pile up. It is like with snow and when the time comes when it starts to thaw, it starts with a trickle, the trickles become a stream and all the streams rush down the mountain.

For the WMF Localisation team we worked on our documentation, our help system and our tests. We went to conferences in Belgium and India. And we worked on many small iterative improvements. We rolled out webfonts to more wikis. Input methods were improved and deployed as per requests. We have had our translation memory working on translatewiki.net for ages and now it is configured for use on the WMF wikis who use the Translate extension. Actually, we did experiment first with a new algorithm and we did configure one of the labs systems as a host for the memory of all the fine work we did and do.

Over time a lot of work went into things like plural rules. As the number of languages increase and as we support not only PHP but now also JavaScript, we are optimising our code and we are checking it again. We frequently find that a re-factoring is in order. It makes the code more elegant and easier to maintain. With added documentation and tests we ensure that we know it will work well.

Another fine project waiting to get to the stage where it will flow into our codebase is an updated Easy Timeline. The functionality has always been broken when used in many of  the “other” languages, languages written in a different direction, a different script.  The updated Easy Timeline has been given a revamp; it uses SVG to create the image and you can test it at translatewiki sandbox. Amir welcomes bug reports and LOVES to hear your comments

As you know, we use mingle for our project management (user guest, password guest). In it we have stories that explain the functionality that we are going to develop. Story 532 is one such:

As a potential translator, I want to be able to tell translation administrators in a structured way that I am interested in translating to one or more languages and at the same time provide them with some data about me and preferences on how and how often I would like to be contacted, so that translation administrators can more effectively and efficiently target translators.

Together with the acceptance criteria a narrative like this enables the developer to develop and the finished product to be accepted by our product manager. A story comes with tasks and once you have read the stories and the tasks you have a clue of what goes into getting you new functionality.

The conferences were great, we learn a lot from meeting so many wonderful people. Many tests are deployed and they run regularly. The documentation, including user documentation is written and we love you to translate many of them in your language. We feel really pumped up to get cracking and provide you with more functionality in the next sprint.

Thanks,
Gerard Meijssen
Internationalization / Localization outreach consultant

Getting ready for when the freeze is done

When you look at the “sprint backlog” in mingle (guest, guest), you may notice that even though we have been slowed down because of the slush, the feature freeze because of the imminent MediaWiki release, we are not sitting on our hands. Documentation, testing, code review and outreach is on our agenda.

Because of the way we are planning, it is apparent how much code review actually gets done. This sprint we added a review of the ArticleFeedback extension for its internationalization and localization aspects. This is a logical development considering that, with 280+ languages, we are not developing for one language. Our objective for this job is: “As a user I can use the functionality of the ArticleFeedbackv5 so that nothing looks odd in my language from an internationalization and localization perspective”. Reviews like this have been performed informally in the past by translatewiki.net staff. This review, however, will be done during Wikimedia hours and reported through Wikimedia channels.

One old open bug is about EasyTimeline.  It started its life in 2005 and it is finally getting the attention it deserves. The bug explains the lack of support for languages like Arabic, Hebrew and Farsi that are written from right to left. The software has Ploticus as a dependency and for a long time the waiting was for a version of this software that does support RtL languages. We are not waiting any longer and you can read in our story 230 about the complexities involved.

You could say that implementing a translation memory for page translation is a bit more adventurous; it is however debatable if that functionality is new; a translation memory has for a long time been functional at translatewiki.net. It is also very much a feature that makes people more productive. Our team has always had the goal of making life easy and productive for our editors and translators.

The “grammar” functionality for JavaScript is part and parcel of the i18n tooling for our developers. It was not ready before the “slush” and it does make our lives difficult not having it available in the code. When you are building tests for “gender” and “plural”, it is so obvious to create them for “grammar” as well. In this sprint, “grammar” will be included in the code for all these good reasons.

This is the first time that there is a story for outreach. We are reaching out to all the Wikipedia language communities to have their own language support team. It will make a difference when all our language communities have been asked to provide their expertise to us. We already have found that many people show an interest and issues do get raised as a result.

Thanks,
Gerard Meijssen
Internationalization / Localization outreach consultant

 

The localisation team sprints into the new year..

WebFonts is the first extension that gets user documentation served from MediaWiki.org. At the time of writing, the documentation has been written, it does serve people with help text about WebFonts and it is ready for translation. People looking for help will be served help in the language of their user interface if there is a translation.

WebFonts drop down on or.wikipedia.org

In a way it seems like a minor thing but consider;

  • MediaWiki can serve help texts for its functionality
  • this help text may differ based on the language of the user
  • the help text can be translated
  • a new community for MediaWiki help text translation is needed
  • functionality like Narayam will surely get its user documentation in the near future

It will be a challenge to other developers and developer teams to adopt and refine the way assistance to our users is provided. We learned at translatewiki.net that documentation did improve the quality of the localisations. We hope that user documentation will reduce confusion and makes for happy editors and readers.

The WebFonts user documentation was deployed last Tuesday. This and some other changes can be found in the deploy list. As the holiday season is in full swing, sprint 6 has started; it will run into the new year.

In this sprint stories will be developed that will make “Translation review” feature complete. When this is implemented, it will help translators and localisers review each others work and assign a status to their work for further considerations. As you can imagine, the different statuses themselves will become available for translation; card 326 defines this and will make this possible. This is just one of many stories that make up this feature.

For the localisers of the MediaWiki software a long held ambition will be realised; card 206 will see “plural” support implemented for JavaScript. When this functionality is deployed, it will result in a long list of future changes that will see changes to the actual messages.

The new year will bring us many new challenges and opportunities to the many many language communities. The Wikimedia Localisation team will work hard to provide you with the tools to be efficient in any language to get our message out and provide information in any language. For some of us the new year starts at a different moment so it will be very much business as usual; we welcome you to have a look at our sprint backlog (user:guest password: guest) and bug us in bugzilla with whatever needs fixing.

Thanks,
Gerard Meijssen
Internationalization / Localization outreach consultant

 

Localisation team sprint 5 update II

Probably the most interesting highlight of today’s i18n deployment is the configuration of the Translate extension on MediaWiki.org. We have observed that on some wikis special pages exist that explain in the language of the Wiki functionality like Narayam or WebFonts. Such documentation is welcome on all MediaWiki installations where the functionality is used by people using the same language for their user interface.

For writing the documentation MediaWiki.org is the obvious platform. With the deployment of Translate we have the basis for writing and translating user documentation in a structured and organised way.

Narayam and WebFonts have been updated to the latest versions that have been tested on translatewiki.net. As Narayam and WebFonts are still very much a work in progress, we invite anyone to continue their testing at translatewiki.net . The changes are:

  • menu appears only on click, not when hovering
  • menu positions are now correct for RTL languages and do not go off screen any more
  • Narayam and Webfonts support the Kannada script for the Tulu language on the Incubator

There are also some smaller fixes among them the change of the autonym for the Veps language to “Vepsän kel”.. The full details for all the changes is at revision 106667.

Thanks,
Gerard Meijssen
Internationalization / Localization outreach consultant

 

Localisation team sprint 5 update

With a new sprint, new functionality for MediaWiki is identified to be deployed in two weeks time. There is room for dealing with issues to do with Narayam and WebFonts. Many of the new activities have to do with documentation, translation and feedback.

The sprint backlog in Mingle (user: guest password: guest)

What we hope for is that the feedback functionality that is now part of MediaWiki can be used to ask for feedback of MediaWiki features. It is obvious that the Wikimedia Localisation team cannot support all the 300+ languages that have their projects or exist in the incubator. What we can do is process the information we get from our language support teams. Figuring out how to do this is one of the goals for this sprint.

The use of Narayam and WebFonts will be helped a lot with documentation; “where to find that character on this keyboard mapping” or “what does an international keyboard look like” are questions looking for an answer. Determining how to document and what to translate is not all that obvious. With keyboard maps and fonts distributed as part of MediaWiki documenting on “the” wiki does not scale to other Wikimedia wikis and, MediaWiki wikis outside the Wikimedia Foundation are as much in need of documentation. When people start using MediaWiki because of such language support features we accomplish real support for a language.

For this sprint, these questions are looking for an answer and in the mean time the Translate extension will gain these new features:

  • Documents that need translation can be grouped together; for instance all the Fundraiser messages or Wikimedia reports
  • Documents can be marked as no longer needing translation
  • Changes to the state of documents and translations will be logged and the log will be available for viewing
  • Depending on the state of a document or a translation, attention can be drawn when there is a need for activity

User documentation needs translation and hopefully many of the algorithms used for the localisation of MediaWiki at translatewiki.net will equally apply for user documentation. Life will become a lot easier for all those people who administer MediaWiki and have only a basic understanding of English. We hope to deliver this in one of our future sprints.

Thanks,
Gerard Meijssen
Internationalization / Localization outreach consultant

Ready for the WebFonts launch

After months of preparation, demonstrating the latest versions in person and on-line, going through tons of feedback and implement resulting modifications, we are ready for the launch of Webfonts. Web fonts is a technology that ensures us that the readers of our wikis will always see the intended characters on their screen. Many devices do not provide the necessary fonts that allow people to read their mother tongue.

When people do not even see what we aim to provide to them, we fail. According to the Wikipedia article, web fonts are considered “controversial” because the licenses of many fonts prevent them from being used as web fonts. There is no such controversy when freely licensed fonts are used and we are really happy with our collaboration with the producers of such fonts.  We learned that fonts working on one platform do not necessarily work as well on another platform / operating system.

Enabling people to read and enabling people to write their language is at this time our prime objective and, when people are happy when they find they can as they did at the localisation sprint in Pune. Being able to type Marathi or Punjabi, Hindi or Tamil on a thin client put a smile on many faces. They used the latest software at translatewiki.net and  the feedback we got from them and others has resulted in many technical and usability improvements.

The launch of WebFonts together with the Narayam improvements on Monday 12 December represents significant progress in helping enable Indic language contributions to our projects; it consists of a large amount of code, it will be implemented on a selected range of wikis and it affects many communities. It will affect them and the Wikis in Assamese, Bengali, Gujarati, Hindi, Kannada, Malayalam, Marathi, Nepali, Oriya, Punjabi, Sanskrit, Tamil and Telegu.  All these communities have been involved it testing the evolving functionality at translatewiki and the comments and bug reports we received were essential for what we are now proud to present. With the launch more people will experience the WebFonts technology for the first time. We are eager to improve on what we have because we believe that the web fonts technology is crucial for the emancipation of many languages and scripts in this digital age..

Thanks,

Gerard Meijssen

Internationalization / Localization outreach consultant

The Mumbai hackathon was sweet

When a hackathon is organised, it is wonderful when the reality of the results exceeds expectations. The reality was that some of India’s best and brightest attended the hackathon. They represented many of the languages  of India, and it showed.

Seven Indians and a German created an input method for their language. A Russian keyboard method is promised for the next day. There was a jQuery wizard who created a wonderful and necessary addition to the Narayam extension: a visual cue to where the characters are on the keyboard. This information comes directly from the Narayam definitions and the best part is that the visual cue actually works as well.

The WebFonts extension got its reality check. WebFonts provides default fonts in order to ensure that nobody sees the infamous Unicode squares and numbers instead of the desired characters. The MediaWiki software is exclusively open source, and consequently the fonts we deliver through the WebFonts extension need to be freely licensed, too.  The default font we use for the Indic languages is the Lohit font produced by Red Hat. It was quite astonishing to learn that some of the characters are not what the character should look like. Bugs have been filed for this at Red Hat and more work will be done.

We are going to roll out the WebFonts extension on December 12th. Our aim is to install it on the Indic projects. When we have freely licensed fonts that show languages correctly, we will finally be able to provide readable content to everyone. We will be working towards resolving the issues identified at the hackathon.

The Mumbai hackathon has also been good for the Kiwix off-line reader; not only was the software localised into several languages, new developers also familiarized themselves with the software itself to implement further improvements. This is quite important because many Indian people have no or intermittent access to the Internet. In addition to Wikipedia content, there are many projects in India to transcribe books that are in the public domain; as the Kiwix software gets ready to support this content, it will help more and more people get access to India’s rich cultural heritage.

Mobile support was the third centre of gravity; many first-time Wikimedia hackers teamed up with seasoned Wikimedia developers and this produced great results. This included work on a mobile landing page for India, as well as a gateway that allows users to receive Wikipedia articles over SMS and the carrier-specific USSD technology. To appreciate this, many people do not have access to the Internet and consequently to our content. Work also continued on the “Wikipedia Zero” project, which aims to bring Wikipedia and other Wikimedia content to millions of users without data charges.

We also saw an interesting connection with the October 2011 Coding Challenge. Developer Yuvipanda implemented Android 2.2 support for one of the coding challenge submissions, the “Share with Wikimedia Commons” Android app (as well as for the official Wikipedia Android app).

All this will get some review, maybe some polishing but we are quite eager to bring this functionality to you.

Many of the hackers were new to MediaWiki. With an introduction by Erik and private tutoring by Sumana, Tomasz, Patrick, and others, several people really got into the swing of things to the extent that some bugs were smashed.  The hackathon proved as always that when you bring great people together special things can and do happen.

Thanks,
Gerard Meijssen
Internationalization / Localization outreach consultant

Interview with Wikimedia’s Amir Aharoni

If there is one thing that makes the Localisation team special, then it is that all the team members were collaborating before they were hired by the Wikimedia Foundation. Continuing in this spirit of “never change a winning team” we are happy with the addition of Amir Aharoni as a developer to the team. We know him well, we worked well together in the past. As they say in insurance: results from the past do not predict future results but this is people and Amir is a great guy.

Thanks,
Gerard Meijssen
Internationalization / Localization outreach consultant

You are a specialist on RtL languages. How many people start writing on the right and does this not smear the ink when you move while writing to the left ?

These are actually two questions :)

I’ll start with the second: no, it doesn’t smear the ink. I don’t remember that it ever bothered me, and to make sure i just tested it on a piece of paper. Several times i read the claim that that was the reason the Greeks switched to writing left-to-right when they adapted the Phoenician alphabet (the older version of Hebrew) to their language, but unless I am missing something that just doesn’t seem to be a good reason.

How many people start writing on the right? It’s very hard to answer that question precisely. The biggest languages that are written right-to-left are Arabic, Urdu, Pashto, Persian, Sindhi, Kashmiri, Azeri – all written with different varieties of the Arabic alphabet; Hebrew and Yiddish, written with the Hebrew alphabet; and Mandinka and Dyula, written with the N’Ko alphabet. (Other important right-to-left languages are Syriac and Divehi, but they have relatively few speakers.)

If you sum up the number of the speakers of all these languages, you’ll arrive at about 400 million people. That’s a large number, but it’s also a very, very rough estimation. First, i didn’t count all the relevant languages; second, many people who speak these languages live in countries with low literacy rates; and finally, many people speak and write some of them as their second language. 

Are there benefits writing from the right to the left ?

To say the truth – no, not really. But there is a benefit in the fact that it exists. I like the general idea of diversity. Direction of writing is just one of those things that shows that very few things in life can be taken for granted, like electricity plug shape, time zone, sexual orientation, taste in food, appearance of things to colour-blind people and, well, almost anything else. And that’s a mighty good thing.

In both Arabic and Hebrew it is possible to indicate what vowels
are used. What is the rationale for not including them per standard?

The simplest answer is “People’s customs”.

When people started writing Arabic and Hebrew, they didn’t write as much as we do today and the variety of words was not so great, so they could easily guess the needed vowels just by looking at the consonants. Our writing today is much more varied and it makes guessing the vowels harder, but the custom to omit them is still there.

I don’t know about Arabic, but there were suggestions to write Hebrew always with the vowels. The most notable suggestion to do this was made in the 1930s by Hayim Nahman Bialik, the most prominent Hebrew poet in the twentieth century and the president of the Hebrew Language Committee. Despite Bialik’s status, this proposal was never implemented, among other things because of the technical challenges in printing books with so many diacritics. And, as much as i love them, i must also admit that writing them all slows down hand writing quite considerably.

There is also the problem of the many differences between the vowel marks and the actual spoken language. Modern Hebrew has five vowel sounds, but over ten vowel marks. In Arabic it goes the other way: it has only three vowel marks, but there are more than three vowel sounds in the spoken language, and it also changes from region to region. So very often a lot of people don’t really know which vowel marks they should write even when they want to write them. This also creates an opportunity for patronizing: knowing the right grammar makes one feel smarter than others and unfortunately some people exploit it in ways that are not very constructive.

There are, however, scripts, the structure of which is quite similar to that of Arabic and Hebrew, and which do indicate all the vowels in writing. The most prominent example is Divehi, whose script is a derivative of Arabic. The scripts of India and Southeast Asia are somewhat comparable as well. My guess (and hope) is that in the near future the modern technology will make writing Arabic and Hebrew with vowels easier, if not universal.

You are a member of the Wikimedia Israel board. Do you think working for the WMF creates a conflict of interest?

I find it hard to think about anything that can create a conflict of interest here. I discussed it with the other Board members and they couldn’t think of anything either.

If anybody does think that it is a problem, i’ll be very glad to hear about it.

Do all mobile phones sold in Israel support the Hebrew script and is combining it with the Latin script possible?

Most phones sold in Israel do support it; they also support right-to-left display and even the support for mixing Hebrew and English in one SMS message is reasonable. I don’t know whether the regulations require it or whether it’s just a matter of demand.

Some people buy themselves fancy smartphones abroad and these don’t always support the Hebrew script.

How do you cope when the Ivrit script is not supported?

Writing Hebrew in Latin transliteration is not quite common and most Israelis know at least some English, so a person who cannot write in Hebrew for any reason would probably just use English. That includes myself; using transliteration would be better in principle, but a lot of people would find it harder to read.

I should also add that when i bought my first mobile phone in the year 2000, Hebrew support in cellphones was still new and uncommon. I could pick between two models – one with Hebrew and one without. I picked the one with Hebrew, even though it cost about a $100 more and it was long before i cared about software localization as much as i do today. I did it simply because it made much more sense to write names of friends in the contact list in the Hebrew script. 

You have a competency in many languages because of your study as a linguist. Can you indicate how different the five languages families are?

The differences i notice are mostly in the grammar features, some of which are very prominent in some languages and hardly existing in others. For example, in Russian, when you say that you read a book, it’s essential to say whether you finished it or only read a part of it. In Hebrew and Arabic a root of word is an abstract unpronounceable sequence of consonants and the actual words are created by inserting vowel sounds between them – this concept sounds quite crazy to speakers of European languages.

In Hebrew and Arabic there’s a strong formal distinction between verbs that describe things that a person does to oneself and things done to others. In Romance languages, like Italian and Catalan, the subjunctive mood is very prominent – it’s essential to indicate whether a person  did something or would do it; this distinction is less essential in English and it hardly exists in Hebrew. That, i’d say, is not just a matter of grammar, but also a way to think about things, but that’s a hard philosophical issue that is very hard to test. Finally, Malayalam has a word order that is logical in itself, but very unusual to somebody who speaks a European language. 

You can write in at least five scripts. What script do you consider the most usable.

I suppose that the five scripts you refer to are Cyrillic, Latin, Hebrew, Arabic and Malayalam. (I can read Ethiopic, too. And, well, Greek, but that’s really not a big deal.)

The most usable is Cyrillic, of course, closely followed by Latin. It’s hard to be objective in such a case, because Russian is my native language, but i really think that it is has the best balance between simplicity, size (slightly over 30 letters), completeness and being fit for the languages it is supposed to represent. (I’ll try to balance my natural bias towards Russian by saying that the Russian orthography is actually relatively outdated and relatively harder than the orthography of other languages using Cyrillic, like Belarusian or Kyrgyz.

Latin is a close second, because in general it is very similar to Cyrillic, but its actual pronunciation and usage, as well names of letters, differ wildly between languages.

I love Hebrew, but i’ll be the first to acknowledge it’s disadvantages. Malayalam, though beautiful to behold, is rather hard to grasp, but once you get the hang of it, it does convey all the needed sounds well.

How do you want to put your stamp on the Localisation team
Except my expertise in Middle Eastern scripts, i hope to influence it in the areas of usability and testing. I don’t claim to be a usability expert, but i care very strongly about it and i want to know that the users of the software i create are actually able to use it. I also believe that all features of software localization must be thoroughly tested; it’s costly and challenging, but important and that’s why i hope to find the time to formulate localization testing policy.

One other and somewhat more personal thing that i hope to achieve through my work in this team is spreading the word about the Software Localization Paradox. 

Your wife is also a dancer, did you ever come across dancewriting

I asked her and she says that it’s a cool idea, but too complicated to learn, and that in the age when phones come with reasonable video cameras it’s easier to just film the moves.

Dancing is her (very important) hobby and her main fields of work are Neuroscience and Physics. That involves a lot of math formulas and in my perspective, what she says about DanceWriting, could well be said about math.

–  Amir

The Localisation team brings you input methods

Sanskrit Wiktionary with the Chrome browser demonstrates issues for the Localisation team.

This #Wiktionary screen shot shows the first iteration of the language support that will be brought to MediaWiki by the Wikimedia Localisation team. It makes it possible for people with a standard US keyboard to emulate a keyboard appropriate for their language.

Narayam, the MediaWiki extension, was originally conceived by JunaidPV and has been further developed to provide keyboards for many more languages. Particularly the people who use the languages from India will benefit. Many different scripts are in use but many computers do not have an appropriate keyboard for the many different languages.

Now that Narayam is live on some wikis, we will gain the experience necessary before it will go live on other projects and for other languages. When it works well, external tools like the ones shown on the Sanskrit Wiktionary can be phased out as well.

At translatewiki.net you will find keyboard methods for many more languages. Please try them out and, when you cannot find a keyboard method for your language, you may discuss within your community if Narayam can be beneficial for your language.

Thanks,

Gerard Meijssen
Internationalization / Localization outreach consultant