Wikimedia blog

News from inside the Wikimedia Foundation.org

Posts by ArielGlenn

XML dumps resumed

Folks that use XML dumps of our projects will know that the dumps process has been stalled while we investigated bug 23264. We have been running individual project dumps manually and asking people to inspect them carefully. We have just started the automated dumps up again, and various code fixes should be checked in shortly. Thanks to all for your assistance and your patience.

If you are working with the XML dumps of the English language Wikipedia containing all page revisions (pages-meta-history), please note the following issues with the two completed runs.

The January 30 run is missing the text for a large number of old revisions of articles, primarily revisions created between January 1 2005 and May 14 2005. This was due to bug 20757 which was subsequently fixed. If you are doing analysis using the text data, you can retrieve the missing text by extracting it from an earlier file; see the archives.

The March 12 run is incomplete; it is missing about the last third of the revisions, due to early termination during the compression step.

The stubs files and the current page dumps appear to be fine, so statistical or other analyses that only use these files should not be impacted. The mysql table dumps are also unaffected.

We apologize for the inconvenience and are working on getting out a set of complete full history dumps with all revision text intact.

Open Translation Tools 2009 report

View of the towers of De Waag, Amsterdam With six projects in over 250 languages, multilingual communication and content translation are big priorities for us. That’s one reason I was excited to go to the Open Translation Tools 2009 conference and be in the same room with 80 other translators, content providers and developers all working in the open translation space. Another reason is that the conference was held in Amsterdam in the old city center, in a beautiful venue right by one of the canals.

We have some amazing opportunities to collaborate with folks on other projects, from translation memory based systems like that in use by the World Wide Lexicon to source code string repository interfaces like Transifex. As one person put it, the perfect testbed for crowd-sourced translation is Wikipedia; if we can’t make it work there, where can it work? I also had a chance to talk with Gerard Meijssen and Siebrand Mazeland about new ways to facilitate tighter integration with translatewiki.net and to encourage more projects to make use of the translatewiki facilities. It should be a really productive year.

Folks told me to go visit the Van Gogh Museum, so I was dismayed to find that they don’t allow photography. However, the Wiki Loves Art NL project, organized by the NL Wikimedia chapter, had reached an agreement with the museum to allow two small groups in for photographs, during the week I happened to be there! So, come Tuesday morning, I was one of 20 lucky Wikimedia community members and photojournalists to be given private access to the Van Gogh collection. Some photos from the group are already available on the flickr group from which they will be uploaded to the Commons.

Right after the conference I went to the first two days of the OTT book sprint, which had as its goal the production of a comprehensive manual for beginner volunteer translators of open content with open tools. Once again we were in an awesome venue (see the picture; we were in one of the turrets!) and under the expert guidance of Adam Hyde we got a huge amount of content generated in just a few days.

On the last day I skipped town to go visit a colleague on one of the Wikimedia projects; we’ve worked closely together for over two years and had never met face to face. Perhaps that was the most important part of the whole trip: bringing our virtual community into the real world one person at a time.

Wikimedia wants you… to be our Office IT Support Lead

Wikimedia is hiring!  If you enjoy keeping an office full of tech gear and hardworking staff running smoothly, we want to hear from you.  We’re a mostly MacOS and Ubuntu shop with a bit o’ Windows thrown in. We have the standard backup, vpn and phone issues of any smallish office, and we have a satellite space that you’d be supporting as well.

Think this job has your name on it? Then read the full job description and application details here at the WMF site, and get your cover letter and resume in to us by May 11.