Stripping question marks from Wikimedia searches

Translate This Post

Photo by Benh LIEU SONG, CC BY-SA 3.0.
Photo by Benh LIEU SONG, CC BY-SA 3.0.

When people ask how old is Tom Cruise? on Wikipedia, they almost certainly don’t expect the question mark in cruise? to match an additional letter. They aren’t looking for the words cruised, cruiser,  or cruises—but that’s what they get, and it keeps them from finding the information they are really after.
Search on Wikipedia (and other Wikimedia projects) includes a lot of features that most users don’t know about. Most require special keywords, and some even require specialized knowledge, such as familiarity with regular expressions. It’s pretty difficult to invoke these special features by accident.
But search also supports two particular bash-style wildcards without any special syntax: asterisks (*) will match any number of characters, and question marks (?) will match exactly one. Asterisks do come up from time to time, but people use question marks all the time—they like to ask questions!
A recent review of query-string features called out quotes and question marks as the two largest-impact predictors of unsuccessful queries on Wikipedia. In a follow-up survey of queries with question marks in six of the top ten Wikipedias (by search volume), most question marks are being used to ask questions (the other four of the top 10 were not reviewed).
In all ten of the top ten, stripping final question marks dramatically decreased the number of ?-final queries that got either no results, or very few results (i.e., less than 3). The improvement was around 10–45% for ?-final queries, depending on the wiki. The overall impact is much more modest (less than 0.5%) because queries with question marks are not terribly common.
As a result of this analysis, we’ve implemented a change to search which will by default replace question marks with spaces (to preserve the word boundaries they intend in queries like how? why?). This setting can be changed on a per-wiki basis (see $wgCirrusSearchStripQuestionMarks), and other options include (i) only stripping question marks at a clear word boundary (such as before a space), (ii) only stripping question marks at the end of the query, and (iii) leaving the question marks alone.
For the rarer users who do use question marks as a one-letter wildcard, when question mark stripping is enabled, question marks can be escaped with a backslash (e.g., wiki\?edia) to preserve their original wildcard meaning. Power searchers who use insource: won’t need to do anything special; queries withinsource: will not be modified.
Below is a screenshot of the former question mark behavior, where it is treated as a wildcard. Note that “living?” only matches the name “Livings”, leading to two very unsatisfactory results.
Screenshot, CC BY-SA 3.0.
Screenshot, CC BY-SA 3.0.

Below is a screenshot of the new question mark behavior, where it is ignored. Now the question and part of the answer can be seen in the snippet for the very first result, and all of the top three results seem relevant.
Screenshot, CC BY-SA 3.0.
Screenshot, CC BY-SA 3.0.

Trey Jones, Software Engineer, Discovery
Wikimedia Foundation

Archive notice: This is an archived post from blog.wikimedia.org, which operated under different editorial and content guidelines than Diff.

Can you help us translate this article?

In order for this article to reach as many people as possible we would like your help. Can you translate this article to get the message out?

3 Comments
Inline Feedbacks
View all comments

I do get more results when searching #Wikipedia because I do use the search extension by Magnus. It is obviously better in two ways. Much better generated descriptions and many many more subjects covered. This tool is in use on many Wikipedias and it does function in any languages including Tamil

no komen

?