Wikimedia blog

News from the Wikimedia Foundation and about the Wikimedia movement

Posts Tagged ‘CLDR’

Primary data about languages

For MediaWiki, the CLDR or Common Locale Data Repository, is a primary source of information. The information about languages Unicode maintains in this standard is what is most relevant to us. It registers its name in English, as well as the autonym or the name in its own language, as well as information like what a date and a number look like,  the script or scripts used for a language and the names of other languages in that language.

We prefer to use standardised information, not only because it is stable and reliable, but because we do not have to collect the data ourselves and also because the data is used by many other organisations and in many other applications. We love the CLDR and we want it to be even better. To make it better we need your help.

Many of the languages that have a Wikipedia and many of the languages that want to have a Wikipedia are not represented in the CLDR. Many Wikipedians know their language really well. They can provide the information about their language and they can verify that the existing information is correct. When there is a need to change things, you will need to create a user.

When a language is not yet supported, you will have to request for the new locale or language to be added. It is expected that you provide at least the core data when you make your request and that you at least complete the minimal data required. One of the questions is: where the language is official, it may be that a language does not have any official status. This does not prevent people from reading or writing that language and it does not mean that information about such a language is not important to us.

When a language is already supported, we want you to verify if the names for other languages exist and are correctly written. There can be issues in any language including English; using the Auracana name for the Mapundungun language is considered an insult.

When you are able and happy to help us in this way, you may be interested in joining our “language support team.” Because of your interest you belong to the group of people we first want to turn to when we have questions about supporting your language. More structured information and room for your reports can be found here. When there are any issues, do not hesitate to report them.

Thanks,
Gerard Meijssen
Internationalization / Localization outreach consultant

Addressing the many

When you have a message, you use the appropriate language and tools to address multiple people. We do not use our eyes to see how many people we address and we do not use a bull horn to be heard. Our MediaWiki software knows the numbers involved and a plural enabled message will be formed according to the rules of the language.

When we implemented plural support for JavaScript, we checked our new implementation for plural with our implementation in PHP and we checked against the standard for such things, the CLDR.

The Localisation team does not know the language rules for the 280+ languages that have a Wikipedia. We prefer to implement what the standard tells us but we support more languages than the CLDR. We want to channel our need for support through “Language Support teams” and we want them to help us understand  and fix the inconsistencies and add the missing information to the CLDR.

Inconsistencies with the CLDR
  • Belarusian – ‘other’ form missing in MediaWiki
  • Belarusian-tarask – ‘other’ form missing in MediaWiki
  • Bosnian – ‘other’ form missing in MediaWiki
  • Manx - CLDR has 3 , MediaWiki has 4 forms
  • Hebrew – CLDR has 2, MediaWiki has 3 forms
  • Croatian – ‘other’ form missing in MediaWiki
  • Ripoarian / Colonian – order of forms different. CLDR says 0,1, other. MediaWiki says 1,other,zero
  • Latvian – CLDR defines zero, one , other forms. MediaWiki has only two forms, one for (1, 21, 31, 41, 51, 61…) and another for rest of the forms.
  • Macedonian – CLDR defines forms[0] for n!=11. MediaWiki defines forms[0] for n%100!=11
  • Polish: ‘other’ form is not defined in MediaWiki.
  • Russian : CLDR defines 4 plural forms. Form with decimals missing.
  • Slovenian – MediaWiki defines a zero form which is not present in CLDR
missing in CLDR
  • Church Slavonic
  • Lower Sorbian
  • Scottisch Gaelic
  • Upper Sorbian

Please make a difference for the support for your language and join the Language support team.

Thanks,
Gerard Meijssen
Internationalization / Localization outreach consultant