Wikimedia blog

News from the Wikimedia Foundation and about the Wikimedia movement

Posts Tagged ‘search’

Search restored after leap second bug

At midnight UTC on July 1, Wikimedia’s search cluster stopped working. A “leap second” inserted by the NTP daemon at that time caused Java processes to lock up, including our Lucene search system. The same bug affected many other websites. Our engineers restored service in less than two hours.

Leap seconds are added to our clocks once every few years so that the sun will be directly overhead of the Royal Observatory in Greenwich at precisely 12:00. Some people believe that the desire to keep these two time standards synchronised is anachronistic, and that it would be better to let them drift apart for 600 years and then add a single “leap hour”. I’m sure many computer engineers would breathe a sigh of relief if such a change were implemented.

Tim Starling, Lead Platform Architect

Usability: Why Did We Move The Search Box?

On May 13th, we changed the default appearance of the English Wikipedia to use the new look developed as part of the Wikimedia Usability Initiative. On June 9th, we unveiled the new look in the remaining top 9 languages (by access volume). Other languages will follow in the coming weeks.

The key elements of the new design had been in public beta testing for many months, and hundreds of thousands of users had already adopted the new look. But, nothing compares to the real thing, and we tried to make the switch as painless as possible — by offering a quick way back to the old layout, by explaining our reasoning, observing and listening to comments carefully, fixing bugs and implementing changes quickly.

The single most frequently expressed concern about the changes we’ve made is the relocation of the search box from the left sidebar to the top right corner. This blog post will give an extended explanation of why we made the change, the other changes we made to the search, and what we’re planning to do next.

The old search box location

The default location of the search box in MediaWiki, the software used by Wikipedia, is below the “navigation” box in the top left corner. This was also the location in the English language Wikipedia, as well as many other language editions. Some language editions, including the German one, had customized the location of the search box, and moved it directly below the logo.

What do we know about search usability?

There are essentially three factors that influenced our decision to relocate the search box:

  • common user expectations regarding the placement of the search box on web pages, as determined by the preexisting body of usability research;
  • usability research regarding ideal search box width, and implications for the search box placement in our layout;
  • ability of our test subjects to locate and use the Wikipedia search box, as determined by Wikimedia usability tests in a research lab.

There are several scientific studies that have examined the ideal placement of common objects on web pages. One early study by Michael Bernard conducted in 2001 by surveying participants regarding the expected placement of web objects such as internal links, external links, and search found that both new and experienced web users “generally expected internal search engines to be located in the upper and bottom-center of a web page. A smaller number expected it to be located at the top right of the page.”

This study was followed up five years later by A. Dawn Shaikh and Keisi Lenz (”Where’s the Search? Re-examining User Expectations of Web Objects”) in a survey of 142 participants. The study found that expectations had changed significantly, especially regarding the placement of the site search engine. The figure below illustrates the areas where participants expected the search to be found:

Expected location of site search engine

As the authors speculate and as seems intuitively plausible, early expectations of the placement of the search box were likely driven by the fact that search was commonly associated only with search engines of the time like AltaVista, not with site-specific searches. As more and more sites developed internal search functions, those were increasingly placed in slightly less exclusive screen real estate than the top center, shifting users’ expectations to look for search features in the top right corner.

Another factor that may have influenced user expectations is the common placement of search engine features in the top right corner of the web browser window.

There are practical advantages of positioning the search in the top right. As summarized in this research paper, several usability studies have pointed out a key advantage of navigational elements being placed on the right: it gives immediate access to the browser scrollbar. This is particularly valuable when a) scrolling up and down a list of search results, b) scrolling up and down an article you’ve just called up for information.

Search box width, and placement implications

 

A separate body of research examines the question what width makes a search box user-friendly. A search box that is too narrow obscures the user’s query while typing, inhibiting their ability to complete their search quickly. Usability luminary Jakob Nielsen recommends an ideal width of 27 characters.

The old search box is approximately 20 characters wide, the new search box accommodates 24 characters. More importantly, due to the placement of the old search box in the sidebar of the layout, widening the search was impossible without either relocating it or widening the sidebar.

The search box placement in the top right allows us to maintain a fixed standard width from one page to the next, while giving us maximum flexibility as to what that width should be. To make it even easier for users, we are experimenting with an expandable search, which is currently deployed in our sandbox 3. When you click the box, it will expand significantly to the left.  We may or may not end up deploying this feature as we continue to look at ways to make search more accessible and user-friendly.

Our own research

In the course of the usability and user experience work since last year, we have so far completed a total of three usability studies, all of which are documented on the usability wiki:

These studies included both remote and San Francisco based participants. While the primary focus of our studies were obstacles people encountered when editing, finding search in the navigation was clearly one of them, and our test subjects tended to resort to common web search engines to navigate Wikipedia instead of using the site’s own search. With the new search box placement, users’ ability to find and use the site search was markedly improved.  One user intuitively used the search box in its new location and then consciously realized that it had been moved.  To see videos of the other subjects finding and using the search box with ease, please see here.

For those unfamiliar with usability testing, it’s important to note that small samples and agile, iterative tests are commonly understood to be an effective method for discovering most key user interface issues. Our sample sizes were actually larger than strictly necessary, and more diverse than typical due to our use of remote testing methods.

With that said, we didn’t test the English Wikipedia against other languages which had placed the search box directly below the logo, and we recognize that this alternative placement is already an improvement to match user expectations. However, based on the cited research above, as well as the design reasons for moving the search box to the top right, we still believe that the overall case for moving the search is compelling even for those languages, if slightly less so.

So .. why did you move the search box? I liked it where it was!

In sum, we moved the search box to a) match web practices and user expectations, b) make it possible to widen it consistent with common usability recommendations, c) in response to actual observed problems of test subjects when using the old search.

We also recognize that millions of Wikipedia users had adjusted to the old placement, and will now have to re-adjust to the new placement. However, Wikipedia’s global audience grows by tens of millions of users every year (it is currently at 375 million unique visitors/month world-wide), and we hope to grow it by hundreds of millions in this decade. That will require that we adapt to common user expectations, rather than expecting every new user to adapt to us.

This will unfortunately inconvenience those who have adapted to the old placement. Do we absolutely know that to be the correct decision? No, but the fact that existing users are temporarily inconvenienced by it is not at all indicative that it is not.

Other search changes we made

It’s worth noting that the search box placement isn’t the only thing we changed about the search function. Perhaps most notably, the old search had two buttons (”Go” and “Search” in English). If you asked even an experienced user what the difference between those buttons was, you would get wildly different answers, and bug 577 had been open since 2004 because of this.

To answer the mystery: the “Go” button attempts to find an article with the same title as the entered search term and, if it fails, runs a full-text search of all articles.  “Search” will always run the full-text search.  “Search” is necessary where you want to search for a word instead of displaying the article of that title (say, you want to search for instances of “George W. Bush” all across Wikipedia).

In the new design, the less common case (search all across Wikipedia for a phrase, regardless of exact match) can be accessed using the “containing …” option in the drop-down menu. We believe this is both a more discoverable implementation, and it reduces overall clutter and complexity of the search.

Measures and coming changes

We are monitoring overall search volume. In the first week since the deployment, we have observed neither a statistically significant increase nor a decrease in search volume, but it’s too early to draw conclusions. There are also confounding variables. As noted above, the search box has changed not just in placement, but also in appearance and behavior. Finally, search volume isn’t the only interesting metric: search convenience (how long does it take users to, on average, find the search) is another one.

We’ll try to get our hands on solid metrics, but we’ll also continue to look for ways to make search more user-friendly (such as the auto-expansion), fix bugs, and so forth. In continuing our efforts to improve the user experience of all our projects, both for new and experienced users,  we’ll try to share our thoughts with you frequently, and work with you to figure out the right answer. And, if you just can’t get used to the new search — you can always switch back to the old layout, which will continue to be there for you.

Warmly,

The User Experience Team

Simplified Search for Vector

The Vector skin differs in several distinct ways from its predecessor Monobook. One of the most prominent distinctions is the location of search controls, now in the top-right of the screen. User-testing has shown that this change has improved users’ ability to find and access the search controls. But there still remained some obvious areas for improvement. Our work to ensure that projects running on MediaWiki can be accessed from a wide range of devices brought us to conclude that the search controls were too space-consuming, specifically taking up too much horizontal space that is otherwise used for displaying top-level menu items. Meanwhile in several of our user experience studies we found that the search controls are generally confusing to users, specifically they did not understand the distinction between go and search.

Screenshot of the simplified search interface for the Vector skin

Simplified search interface for the Vector skin

To solve these problems, we’ve developed a simplified search interface for Vector. The “Search” and “Go” buttons are gone, but their functionality live on. As you type, search suggestions are offered and accessible via the mouse or keyboard using the up and down arrow keys. “Go” is still the default action, executed by pressing the enter key on the keyboard. To perform a full-text search, users can click on the “containing” option within the search suggestions or press the up arrow key on the keyboard. Also, the new search uses less horizontal screen real-estate, making more room for top-level menu items.

You can experience the new search interface by visiting our prototype site.

Trevor Parscal, Lead Features Engineer

OER Search Discovery – not just another TLA

metaberkmanI’ve spent today in sunny Cambridge, MA attending the OER Search Discovery 2009 workshop at Harvard’s Berkman Center. But what’s it all about?

First off, what’s OER?

Open Educational Resources are a litle tough to really define to everyone’s satisfaction, but we can defer the details. :) We’re generally talking about pedagogical materials (something that could be put to use in the classroom to teach students) available under some sort of open content license.

Secondly, what’s OER search?

Creative Commons’ ccLearn project has put together DiscoverEd, a prototype search engine which includes some relevant metadata (subject matter, language, target age range, license) as well as metadata about which collection of resource links it came from. This is rather clever, allowing teachers or students to limit their searches to what’s relevant as well as what’s trusted.

Third, what’s OER search discovery?

Traditionally, most electronic educational resource collections have been walled silos. Even if the materials themselves are open and redistributable, the collections’ searches are separate, and often there’s been confusion over the openness of the metadata as well which has held back federated searches on a larger scale.

With major search engines like Google and Yahoo now starting to index metadata embedded in web pages (RDFa and/or microformats) and make them available for searches, this a great time to start pushing more active and integrated semantic search data. (Note: here we’re talking about metadata about the actual materials, not about the subject of the materials. That’s a matter for another day!) Content creators — if enabled by content management tool developers — can start actually getting some concrete benefit from embedding semantic data into their web sites. These’ll be picked up by the general search crawlers, but will also be available to targeted repositories collecting links and metadata about educational materials on the web.

How can we benefit?

There are two sides of this which Wikimedia can work at:

  • On the content creation side, we can provide more ways to add useful metadata to our pages, making it easier for teachers and students searching through educatinoal-themed portals to find them. MediaWiki already provides basic language and license information, but projects like WikiBooks and Wikiversity (as well as other MediaWiki users like WikiEducator) could definitely benefit from a consistent way to specify the subject and target audience of lesson modules.
  • On the consumer side, we want to be able to find and use free/open media resources from elsewhere on the web to supplement the ones we already have on Wikimedia Commons. The in-development Add Media Wizard can currently search and fetch from a few hardcoded repositories like Archive.org and Flickr, but editors could benefit a lot from having either broader (whole internet search like Google Images with license limits) or narrower sources (a particular educational resource repository desired by a given site or community).

How can we help?

  • We’ll want to find a good, clean, maintainable, and easy to use way for wiki page authors to add resource metadata to their pages, which can be exposed to spiders and repository crawlers. RDFa vs microformats vs XHTML vs HTML 5 needs some resolution on the output format, but more interesting is making sure we have a clean user interface/workflow in the edit window without cluttering up the wiki markup.
  • If/when folks standardize on a search query format as well, we can make it absurdly easy to add specific repositories to the MediaWiki media picker. In the meantime, we can target some whole-net search engines that index license and subject metadata such as Yahoo’s SearchMonkey, which will provide relevant indexing of web sites which have provided for metadata autodiscovery with embedded RDFa etc.
  • We might also think about acting as a repository ourself — Wikipedia and our sister projects are full of references to excellent resources both online and off. Can we record what we know about them and make that searchable internally and externally?

Folks at the workshop are also hoping we can agitate for similar moves in other tools… I know I would benefit from a free media picker for WordPress!

Brion Vibber, Lead Software Architect

Chinese-language search fixes for MediaWiki

Search is an important part of any web app like a wiki, but search is harder than it looks — especially in a multilingual environment.  MediaWiki has to support not just your standard Western languages like English and Spanish, but many more with special requirements:

  • Some can be written in multiple scripts (such as Serbian in Cyrillic or Latin), and searches should match text written either way.
  • Some languages don’t use word spacing, like Chinese and Japanese. To let the search index know where word boundaries are, we have to internally insert spaces between some characters:

维基百科 -> 维 基 百 科

Then to add insult to injury, we need to fudge the Unicode characters to ensure things work reliably with older and newer versions of MySQL:

维 基 百 科 -> u8e7bbb4 u8e59fba u8e799be u8e7a791

For a long time, this word segmentation wasn’t being handled correctly for Chinese in our default MySQL search backend, so searching for a multi-character word often gave false matches where the characters were all present, but not together.

This is now fixed for MediaWiki 1.16; the intermediate query representation passed to the search backend now internally treats your multi-character Chinese input as a phrase, which will only match actual adjacent characters:

维基百科 -> +”u8e7bbb4 u8e59fba u8e799be u8e7a791″

Note that Wikimedia’s sites such as Wikipedia run on a fancier, but more demanding, search backend with a separate Java-based engine built around Apache Lucene. Sometimes we have to remind ourselves that third-party users will mostly be using the MySQL-based default, and oh boy it still needs some lovin’! :)