On a daily basis, millions of terms are entered into the Wikipedia search engine. What comes back when people search for those terms is largely due to the work of the Discovery team, which aims to âmake the wealth of knowledge and content in the Wikimedia projects easily discoverable.â
The Discovery team is responsible for ensuring that visitors searching for terms in different languages wind up on the correct results page, and for continually improving the ways in which search results are displayed.
Dan Garry leads the Backend Search team which maintains and enhances search features and APIs and improves search result relevance for Wikimedia wikis. He and his team have a public dashboard where they can monitor and analyze the impact of their efforts. Yet, they do much of their work without knowing who is searching for whatâWikipedia collects very little information about users, and doesnât connect search data to other data like page views or browsing habits.
Dan and I talked about how the search team improves search without knowing this information, and how different groups of people on Wikipedia use search differently. An edited version of our conversation is below.
âââ
Mel: You mentioned in an earlier conversation we had that power editors use Wikipediaâs search in a completely different way than readers. What are some of the ways that power editors use search?
Dan: Power users use search as a workflow management tool. For exampleâthey might see a typo that annoys them or a word in an article that is misused a lot, or be looking for old bits of code that need to be changed, and then search for that to see if corrections can be made. In that case, unlike your average user, theyâre actually hoping for zero results from their query, because it means the typo isnât present anywhere.
Another way that power users might use search is to look for their usernames because they might want to find places where theyâve been mentioned in discussionâand they want to âsort pages by recencyâ so that they can see the most recent times theyâve been mentioned.
That represents a divergence from someone who simply wants to find an article. Our power users arenât always trying to find an articleâtheyâre trying to find pages that meet certain criteria so they can perform an action on those pages. Theyâre interested in the whole results set, rather than 1-2 results.
âââ
Mel: It sounds like power editors donât always want or need relevancy. (Although Iâm sure sometimes they do.)
Dan: Thatâs right. Itâs something weâd like to study more in-depth. We prioritize relevancy for readers but editors and even some kinds of readers might need something completely different.
âââ
Mel: There are a lot of ways to search Wikipedia. Off the top of my head, I can think of searching through search engines, through wikipedia.org, through an individual article page, and then on the mobile apps. Do you notices differences between all of these different pathways into the site?
Dan: Occasionally we do. I used to be a product manager for mobile and I was focusing a lot on search. I was interested in search as an entry point for the mobile app.
But we found that a lot of people were having trouble with things like finding the search tool. We had made an assumption that keeping a search query in the search bar would be useful for the end user, but people thought that was the title of the page, and they were really confused.
When we realized that this could be an issue, we did a lot of qualitative user studies with people, and asked staff who werenât on the product team what they thought. It was helpful to get perspectives of this feature on the app outside of the dev team, from actual users.
We decided to change the way that search appeared in the app once a page loaded. When people navigated to that page, we deleted their search phrase from the search box which helped people know where to look to start searching again.
Weâve also thought quite a bit about images and their relationship to search. We thought about adding images in search results, and we found that adding images to the search results changed user behavior quite a bit. Instead of clicking on the first link, which may or may not have been the most relevant result, they would almost always prefer articles with pictures, even if the articles were further down the search results page. We asked why, and people said that they felt that the result was more comprehensive or complete.
Itâs funny how changing something small can immediately have a huge effect. When we made the picture change, we also saw a small drop in people clicking through to the articles. This alarmed us because we thought we were enhancing things for the end user, and we were worried that by adding the pictures, that we may have inadvertently caused them to not get the information they needed. But we did some digging, and found it was the opposite: Â for some queries, the answer to the search query was given in the search results so they didnât need to go to the article. We were meeting their user needs earlier in the search process which was fantastic.
You really need both quantitative and qualitative data to truly understand all the ways users use your product. Having either only one or the other can paint an unclear picture.
âââ
Mel: What kinds of things do you think about when thinking about relevancy?
Dan: This is a tricky topic. The fundamental approach assumes that you can break down relevance into an equation that aggregates different factors, and then produces results that are âthe most relevant.â Thatâs clearly not always going to be the case. If I search for âKennedy,â I could be looking for the airport, or the President, or I might be looking for John Jr. or Ted. There is no single correct âmost relevant resultâ for that query.
Thereâs a multitude of different factorsâwe used to use something called tf-idf to figure out what to surface in what order. tf-idf stands for âterm frequencyâinverse document frequencyâ, which combines measures of how much words are mentioned in one article with how much theyâre mentioned in the whole site.
So if I were to search for âSochi Olympicsâ. The word âSochiâ is relatively rare, but the word âOlympicsâ is much more commons, it knows that âSochiâ part of the query is probably the more important one, and thatâs how it finds the 2014 Winter Olympics article as opposed to other articles about the Olympics.
âââ
Mel: It sounds like that would be challenging for words that have multiple meanings.
Dan: Thatâs true and something we think about a lot. If you go to Wikidata, and you search for life on the search page, you get search results like: Life Sciences, the Encyclopedia of Life, IUBMB Life, Cellular and Molecular Life Sciences, the phrase slice of life, the video game Half-Life⊠but you donât get the item on the concept of something being living.
And thatâs because of the term frequency and inverse document frequency. A lot of the pages I just mentioned a lot of them have the term life in them. And, by coincidence, the item about life itself doesnât actually have the word life in it very often. Which means the actual result for life is far down, because it doesnât seem as important as the others, even though it is!
âââ
Mel: I imagine there must be ways to mitigate that.
Dan: Weâve switched to an algorithm Okapi BM25 instead of tf-idf – itâs a newer algorithm. (BM stands for Best Match.) Basically, what BM25 says is that there isnât a huge difference between a term being mentioned 1000000 times and a term being mentioned 10000 times. Â Using the new algorithm and switching to a more precise way of storing data about articles helped with the Kennedy problem a lot, because itâs paying less attention to how frequently the word Kennedy appears in other pages since itâs used a lot in this page. Before John Fitzgerald Kennedy was on the second page of result, and now heâs about 7th or 8th in terms of results.
âââ
Mel: Does the site use BM25 everywhere?
Dan: We use BM25 on every Wikipedia that is not in Chinese, Thai, Japanese and other languages where words in a sentence donât have spaces in between them. We tested BM25 and it caused a massive drop in the zero results rate on the spaceless languages due to a bug in the way words are broken up, or tokenized. We learned the algorithm wasnât working on those languages, and we deployed it everywhere else. Weâre hopeful that we can fix that problem for spaceless languages in the future.
âââ
Mel: What has been the most unexpected thing youâve learned through search?
Dan: There is a surprising long tail when it comes to the frequency of searches.
One of the first things we were asked by our community members is âWhy donât you make a list of the most popular queries that give zero search results so editors can make redirects or find articles that need to be written?â
The data is not that useful, as it turns out. In our analysis of the problem, some of the most popular zero result searches were â{searchTerms}â and âsearch_suggest_queryâ which we think are bugs in certain browsers or automated search systems.
We also found that a lot of people were searching for DOIs, which are digital object identifiers used by academic researchers. Most of the searches for those got zero results. We had to ask ourselves âWhat are people doing?â And we found there was a tool that let researchers put a DOI into it to see whether their paper was cited in Wikipedia. Of course, most papers that people are searching for arenât in Wikipedia, so itâs actually correct to give them zero results!
When I started in search, we believed that users should never get zero results when searching. But it turns out that a lot of people were searching for things we donât have and itâs correct to give them zero results.
âââ
Mel: I know that Wikipedia has a very strict privacy policy and tracks hardly anything. What do we collect?
Dan: We do track some info. We have event logging that says âThis user with this IP clicked on the 4th result, it took us this long to give them resultsâ, and so on. But, itâs Wikimediaâs policy to delete all personally identifying information after 90 days. That is a very intentional thing we decided to protect user privacy.
If you donât want information about users to be revealed, the only thing you can do is to not record it. If we get subpoenas, we are legally required to comply with. But if we donât have that information, we obviously canât give it out! So itâs the safest way to keep usersâ privacy protected. We can figure out some things by language, but not geography.
But itâs tricky sometimes. A good example of that within the Latin alphabet is the search term âparisâ. What language is that in? Is it English? French? If I search for âcologneâ, itâs a city in Germany but also a perfume in English. And thatâs an example of relevance. Is a user who searches for âcologneâ searching for a fragrance or a city? These things make delivering good search results really hard, but we keep on trying, and keep making them a little better every day.
Melody Kramer, Senior Audience Development Manager, Communications
Dan Garry, Lead Product Manager, Discovery Product and Analysis
Wikimedia Foundation
Can you help us translate this article?
In order for this article to reach as many people as possible we would like your help. Can you translate this article to get the message out?
Start translation