Photo by João Silas, freely licensed under public domain/CC0.

Photo by João Silas, freely licensed under public domain/CC0.

Mississippi. That’s four Ss, four Is, two Ps, and an M. Can you spell it right every time?

Or take Czechoslovakia. Saskatchewan. Abkhazia. Kyrgystan.

Did you notice that Kyrgyzstan was spelled wrong?

The English language borrows a large number of geographical locations and words from other languages and cultures, making it difficult for anyone to master (native speakers or not)—and English is far from the only language that has hard-to-spell words.

That’s one reason why the Wikimedia Foundation has developed a completion suggester for the Wikimedia projects to help you even when you can’t remember how to spell the word you’re looking for. This tool can autocorrect for typos and missing words, like “the” and “of” from The Lord of the Rings, and incorporates data like pageviews, so pages viewed more often are placed higher in search results.

This feature is now live on every Wikimedia wiki in all 292 languages, with the only exception of Wikidata.[1] The completion suggester, along with other recent updates from the Discovery Department (did you see the wikipedia.org search portal update last week?) are part of a steady stream improvements to search across the Wikimedia sites.

A rollout of this size was a large undertaking by the Discovery Department, but perhaps the most notable aspect is something that might appear minor at a first glance: typo detection. “When a person is browsing on their mobile phone, it’s easy to get one or two characters wrong,” Discovery’s lead product manager Dan Garry told us. “Detecting these typos is critically important as more and more of the world uses a mobile phone as their primary device to access the Internet.”

Autocompletion_suggester_before_-_david_bowtie

Autocompletion_suggester_after_-_david_bowtie

Searching for ‘David Bowtie’ yielded no suggestions before the completion suggester; with the tool enabled, it suggests ‘David Bowie’—the famed musician who passed away earlier this year. Before and after GIFs by Chris Koerner, CC BY-SA 4.0.

Multiplying the effectiveness of this typo fixing, Garry notes, is “the amount of languages we’re able to offer. With a completion suggester enabled, our users and readers on every Wikimedia site are now able to more easily find and discover content regardless of their preferred languages.” With multiple Wikimedia projects available in many different languages, hundreds of Wikipedias, Wikitionaries, Wikisources, and more will utilize this tool.

And this is far from the last areas bolstered by a completion suggester. Readers will see results in the new Wikipedia.org search portal, as well as the search function in the two Wikipedia mobile apps, including the newly redesigned Wikipedia app for iOS.

The impact of the seemingly simple feature extends beyond regular search queries as well. Discovery expects that some burdens on volunteer editors will be reduced, as the introduction of a completion suggester has made it easier to link to different pages within VisualEditor, a rich-text editor with a less code-heavy interface, and reduced the need for editors to manually create redirects.

Wikimedia Foundation software engineer Erik Bernhardson told the blog that in total, Discovery expects that the completion suggester will be used in about 70 million queries per day. Editors and readers should expect about a 10% reduction in the so-called ‘zero results’ search rate, where you search for a term and don’t get any results—a priority area for Discovery.

In previous years, we relied on editor-created redirects to send people from terms like “Jurasic Park” to “Jurassic Park.” Absent those redirects, the previous suggester would use only a simple prefix search, meaning that typing in “Jurasi” would only bring up items that started with those first six letters.

These results were not ideal, which showed when we analysed our search data last year and found that our zero results rate was a surprisingly high 20–30%. While many were things that should be zero results—one shouldn’t expect a search query of “fmoqnguiwrmcaef,” for example, to return anything—a non-trivial number of those were queries that should yield results.

Now, the completion suggester automatically corrects “Jurasic Park” misspellings. Of course, this is all going on in the background, but now you have more time to find what you need now and get back to your reading on everyone’s favorite dinosaur theme park.

Our thanks go out to the team behind the open-source ElasticSearch, which has underpinned the work that has gone into the completion suggester and whose language abilities have enabled us to roll this out to Wikimedia wikis across all languages.

Ed Erhart, Editorial Associate
Wikimedia Foundation

[1] Wikidata, a unique Wikimedia project, uses an entirely different search mechanism that the completion suggester is not compatible with. The Wikidata team is doing their own work to improve search on Wikidata, with advice from WMF Discovery.