Wikimedia blog

News from the Wikimedia Foundation and about the Wikimedia movement

Swedish Wikipedia surpasses 1 million articles with aid of article creation bot

On June 15, 2013, Swedish Wikipedia hit one million articles, joining the club of English, Dutch, German, French, Italian, Russian and Spanish Wikipedias. The article that broke the barrier was the butterfly species Erysichton elaborata. There is, however, one fact that separates this million article milestone from almost all others.

The one milionth article was not manually created by a human, but written by a piece of software (a “bot”). The bot, in this case, Lsjbot, collects data from different sources, and then compiles the information into a format that fits Wikipedia. Lsjbot has to date created about 454,000 articles, almost half of the articles on Swedish Wikipedia.

Lsj, Sverker Johansson, who runs Lsjbot

Bot-created articles have led to some debate, both before Lsjbot started its run, and currently. First, there was a lengthy discussion on Swedish Wikipedia after the initial proposal by Lsjbot’s operator, science teacher Sverker Johansson. The Swedish Wikipedia community was wary, having learned the lessons from previous conflicts about article-creating bots, including rambot in 2002. But there was also curiosity, so a series of test runs was made to make sure that the articles were acceptable.

After review, the Swedish Wikipedia editor community said okay. Lsjbot started by creating articles about different species of animals and plants – articles that are largely uncontroversial and that can have a similar format without feeling mechanical.

Subsequent criticism has come from prolific article writer Achim Raschka on German Wikipedia’s Kurier. Here the main complaint was that article is short: only 4 sentences long. This is a valid complaint. Even if longer articles are not always better, they tend to contain more information.

Therein lies the rub. The bots use as many datasets as their operators can find, but many sources are behind paywalls or are incomplete across entire taxon (covering only selected species). The upside of this criticism is that each statement in articles created by bots is supported by references, something that doesn’t happen in many other articles. This means that more references are added to Wikipedia by bots than by humans. This is of course not in itself a sign of quality, but it is a start for human contributors to search for more information. As with any article in Wikipedia, the readers can also help make bot-created articles better.

Is this the future for Wikipedia, to let software create articles? With Wikidata, it is certainly becoming easier to use software to create articles, something that can benefit the smaller Wikipedias. But we still need more humans to help make the determination of which sources are high quality, what information is presented correctly and what qualifies as clear writing.

So far, bots have shown that they are much quicker to create articles. In that respect, I, for one, bow to our robot overlords.

Lennart Guldbransson, Swedish Wikipedia editor

9 Responses to “Swedish Wikipedia surpasses 1 million articles with aid of article creation bot”

  1. I think eventually, most things online will be automated so there is less need for human intervention.

  2. Targaryen says:

    Bots are much better at creating articles. Humans need not apply.

    Wikipedia Netherlands was the first Wikipedia to break the 1 million article barrier through the use of bots.

  3. Paracel63 says:

    @Bjarne. That issue is discussed on the article’s discussion page. So far we have not got to the point that we think the new naming (made in 2010) has been embraced by the scientific sources. I think we have to wait and see, and in the meanwhile this article has gained a little weight.
    @Bennylin. What Ljsbot is doing is exactly that, making a Wikispecies inside a Wikipedia edition. Personally I think this is good, as Wikipedia allows for a more extensive cross-linking and categorising and makes Wikisource-type material available for editing to a larger Swedish-speaking community. I’m not alien to the idea of incorporating other Wikimedia projects inside the svwp/Wikipedia walls, and I think Wiktionary (svwi has a small user base and very seldom uses references in their editing process) is a suitable project to at least start thinking of.

  4. bennylin says:

    Before that we’ve seen the bot run in Cebuano and then Waray-Waray Wikipedia. I say these articles better exist at Wikispecies, and make Wikispecies a multilingual site. I wonder if Swedish Wikipedians have considered the thought.

  5. Bjarne says:

    From http://de.wikipedia.org/wiki/Wikipedia_Diskussion:Kurier#Schweden_feiert_1_Million

    “[[sv:Erysichton palmyra]] ist ebenso veralteter quatsch wie [[sv:Erysichton]]. die gattung wurde bereits vor 3 jahren ebenso aufgespalten. der millionste artikel der schweden wäre besser unter Jameela palmyra angelegt worden.”

    In short: the article is outdated. We celebrate masses of articles nobody can maintain or correct.

  6. Erik Zachte says:

    As author of Wikistats I always took a keen interest in how wikimedians judged about or participated in the friendly rivalry between language versions of our projects. After all I helped to make the ranking between versions more visible. Too often to my taste I witnessed on-wiki conversations that some language project had been surpassed in number of articles by another, and even a few times that “a bot will fix this”. So I became a little wary about mass article creations by bots, and the true motive of their makers. When several bots on the Dutch Wikipedia started to ‘flood’ the wiki with taxonomy stubs [1] I was really disappointed. What good would that do? Who is going to extend these articles, or even just visit them, other than via ‘Random Article’?

    A few months ago I helped an acquaintance to upload images to Commons. Already for many decades she has a passion for one particular subclass of slime moulds (‘myxomycetes’) and she has shot many wonderful pictures, some of which also are stored in the archives of Leiden University (Herbarium). A small selection of these pictures now also have found a safe haven on Commons: [2]. We spent several sessions to master the intricacies of Windows, and the browser, and pitfalls on Commons (for a computer novice even Upload Wizard can be daunting). As she is of very respectable age, I admire her persistence. It would have been nice if we could have finished by adding some of these pictures to the proper article about this very species. And then I discovered these did not yet exist in many or all occasions, and I regretted it. Even a stub with just a taxonomy info-box would have been a great placeholder. Although she could create some of those articles herself knowledge-wise, I don’t see her master wiki syntax yet, and also she is far too modest to just ‘be bold’. So this personal experience made me change my mind: an abundance of well organized and formatted stubs can serve a purpose, and who knows how many articles will be extended by other bots from other sources, and some day nature lovers will turn to Wikipedia in much larger numbers and manually build on the rudimentary framework. In its earliest years Wikipedia proved an unbelievable success, it was a miracle. A few years later when Steve Coast expressed his vision for a free world wide street map it seemed already ‘tough but doable’. So who can say what will happen with up to 8 million taxonomy stubs in another decade?

    The potential editor base who could add to these taxonomy articles is huge. In a Dutch context: 1:100 Dutch (152K in 2011) is member of a society for protection of birds, 6 other nature clubs are even larger, up to 5-fold. The club for nature guides/educators has 20K members. So if we can spread the word more effectively and solicit support from a tiny fraction of those nature lovers, a fair share of those taxonomy stubs could start to blossom in coming years. How about a Wiki Loves Nature picture contest?

    [1] 0.9 million bot created articles now on dutch Wikipedia (57%) see for all wikipedias: http://stats.wikimedia.org/EN/BotActivityMatrixCreates.htm
    [2] http://commons.wikimedia.org/wiki/User:HelenGinger

  7. Rudolf Olah says:

    Bot-created articles are being used elsewhere, such as for sports news stories where much of the article is a listing of stats intermixed with quotes from players and coaches.

    I like this idea because it at least creates more meaningful stubs. If the articles created by the bot are too shallow, they at least provide a starting point for someone to edit. It can be hard to get started editing a wiki and this might encourage more people to do it.