Photo by Vaido Otsar, CC BY-SA 4.0.

It has been five years since it first became possible to download a full text version of English Wikipedia through the OpenZIM format. In the intervening years, there have been many additional performance improvements to Wikipedia to improve the experience of low-bandwidth users. In this chat, Melody Kramer talks to several members of the Wikimedia Foundation’s Audiences, Global Reach, and Performance teams about ways to improve access for users with low or limited bandwidth.

———

  1. I’ve seen a fair bit of chatter on Twitter about the need for news and information websites that are designed for low-bandwidth users (primarily for use during disasters like hurricanes or cyclones or floods, when lots of people lose electricity.) There are a few examples—in the US, both CNN and NPR have text-only sites, The Age in Australia has a similar set-up, and Twitter and Google News have both released low-bandwidth versions of their sites—but most news and social sites have lots of pictures, Javascript add-ons and ads.

    Your team has done a lot of research into how to design sites for low-bandwidth users—whether they’re accessing content during a natural disaster or whether they routinely have limited access to the Internet based on cost or other factors. Can you share a little bit about what you’ve learned about these use cases?

Dan Foy:  A few years ago, we created a service to provide access to Wikipedia articles through a combination of SMS and USSD technologies.  It worked by allowing users to name a search term, then presented a short list of choices for disambiguation and then another list of sections within the article.

Once chosen, the content was sent via concatenated SMS which allowed about a screen full of content to be received, with the option to continue reading. We conducted a 2 month pilot with Airtel in Kenya, which allowed us to test performance and gauge interest in the service.  We reached over 80,000 sessions within 2 months, and saw a lot of repeat usage.  Average unique subscribers used the service around 9 times per month.  Even after the pilot officially ended, usage continued to grow organically past 100,000 total uses just by word of mouth.

Jorge Vargas:  We also conducted a USSD-only pilot in Argentina. Through this pilot,we realized that this was a complicated method to use because it required the user to have some digital skills (and a lot of patience) to navigate through the platform. Despite the very poor UX USSD offers, the fact that it can be used on any kind of phone without the need of any data is a huge value when facing access issues. We learned that with a proper marketing campaign and some sort of capacity building or using manuals, we can elevate usage of this tool.

———

  1. What kinds of performance improvements has Wikipedia done to make the site accessible for low-bandwidth users?

Anne Gomez: We have improved the site loading time for low-bandwidth users in a number of ways recently, including showing users the first section of an article at the top and improving the way photographs load on mobile to reduce data usage.

Gilles Dubuc: Performance is an important topic, and the Foundation has a dedicated team for it. Our Performance Team constantly releases performance improvements to the existing platform, which benefit low-bandwidth users in greater absolute terms than people with fast internet connections. When we make the wikis faster by 5 percent, it might not be felt by people with fast connections, but for someone with a slow internet connection, it might represent seconds of waiting time saved per pageview.

Our synthetic testing platform, currently based on WebPageTest, simulates low bandwidth conditions, which allows us to keep an eye on the evolution of performance for low-bandwidth users. This helps us both to catch performance regressions when they happen and to quantify the impact of improvements we make.

We are also in the process of opening our first caching data center in Asia to improve the performance for users in that region. While bandwidth is an important factor, the bad performance can also come from high latency. Since we’re limited by the speed of light and the physical distance between users and our servers, we have to get closer to them. The decision to open this new data center is directly derived from performance data collected with probes in those regions. With this real world data, our Technical Operations team was able to identify the best physical location possible to achieve maximum impact. This new location is expected to open in late 2017/early 2018 and we’ve already set up additional performance metric measurements focused on Asia in order to assess the before/after impact of this big change.

As for past achievements, it’s best to look at the trend of our core performance metrics over long periods of time. While we sometimes get big wins with big visible changes, such as the effect transitioning to HHVM has had on article save time (cutting it in half), hundreds of small performance improvements over a long period of time have had an even bigger impact. While HHVM brought us from an average of 6 seconds to save an article edit to 3 seconds in 2014, we have since been able to reduce the average edit save time to less than 900 milliseconds. This is the result of a constant focus on performance at the Foundation. This culture, applied to many small individual engineering decisions, leads to tremendous performance improvements over time.

The long-term impact we’ve had on front end performance is less clear. Last year we fixed a number of issues that were previously skewing those metrics and we’re in the process of overhauling our real-user metrics. We know it hasn’t worsened, but we can’t claim it has improved. But maintaining current performance is a challenge in itself, as the wikis are more and more feature-rich. We work with all teams releasing new software, as well as volunteers, to ensure that feature releases don’t impact performance negatively. This is critical for users with bad internet connections, which would be disproportionately affected by performance regressions. So far, the dozens of identified performance regressions – that are often the result of unforeseeable side effects – which the Performance Team has caught since its inception have all been fixed quickly.

Measuring the performance as it is experienced by users and interpreting the data correctly is a significant challenge in itself, which you need to have been constantly good at for a long period of time in order to be able to claim performance gains with absolute certainty. This is even more true for users with bad internet access. The classic example being that for some users their internet connection is so bad that a given request ends before completion, never reaching the stage where it can send us data about its performance. In essence, the worst experience can’t be measured. And when we improve performance to the point that those users can start having a working experience, it might still be a slow one, and make it look like performance has worsened on average (because we start getting metrics for these users, but they’re slower than average). When in practice those users went from having an experience so bad it didn’t work at all, to having a slow, albeit working, experience, which was obviously an improvement. Thankfully web performance is a very active field and the companies developing browsers are constantly releasing new performance-related APIs, which we leverage whenever we can to understand performance better.

———

  1. If organizations wanted to make their sites accessible for low-bandwidth users, where do you recommend they begin?

Anne Gomez: A lot of people who are cost-conscious with their data use proxy browsers such as UC and Opera Mini. These browsers strip out most of the data-heavy content and features, including removing JavaScript, which is essential for most modern sites to operate. Without getting too deep into the technical ways that they do this, it’s important for brands with a global presence to make sure that their sites work well in these browsers. Even if the functionality is limited, relative to the full site, users of these browsers shouldn’t have a broken experience.

Jorge Vargas: Having a no-pictures version of Wikipedia was something we had with Wikipedia Zero in the initial stages. I think it could be an accessible way to get to Wikipedia for low-bandwidth users—perhaps involving an opt-in or out option. That said, I’m not sure there would be a huge difference, as articles are usually heavier on the text side. There are definitely pros and cons for this.

Olga Vasileva: As Anne pointed out, we implemented lazy loading of images on the mobile website.  This means that images load as the user scrolls down the page.  If a user only views the initial sections of the page – they do not download the data for images below the fold.  For many websites where users are not likely to read the entire page – lazily loading images or other content is an efficient way of saving data for their users.

Gilles Dubuc: We have to measure the performance as experienced by those low-bandwidth users accurately first. We’ve seen examples in the industry where things started with a good intent (eg. making a text-only version of the website), but the execution was poor, with the “light” website loading megabytes of unnecessary JavaScript libraries because that’s what their developers were used to work with.

Developing a website focused on low-bandwidth users requires a drastically different approach than developing a website focused on being feature-rich. Not that those two objective are incompatible, but performance/lightness is difficult to achieve after the fact by retrofitting an existing website. It has to be a core concern from day one and requires great discipline that goes beyond just getting things to work. This is why you usually see that those projects are separate websites, because it’s easier to achieve when starting from scratch. The ideal is having a single website that does what’s best for low-bandwidth users by adapting the experience for them, of course. And much like accessibility, improving the experience for low-bandwidth conditions usually makes the experience more pleasant for users with high bandwidth internet as well.

———

  1. I realize we’re talking about websites, but there are also ways to think about USSD and SMS. How have you thought about those platforms when thinking about conveying information to the end user?

Jack Rabah: We are currently exploring a partnership to offer free Wikipedia via SMS and voice with a global mobile service company. This collaboration will deliver Wikipedia content to MNO subscribers, free of charge, through the interactive SMS and voice capabilities of their platform. We are exploring this as a pilot in order to learn more about how well this works in practice. From the lessons we learn from this pilot, we hope to eventually make this service widely available to reach the billions of people who have mobile phones, but cannot afford access to the internet.

Jorge Vargas: USSD is an interesting way to bring information to the end user. It works with really low-bandwidth, and there is no need for a smartphone. The problem is the strong limits to be able to obtain text (just two or three lines are displayed), there’s a timeout that requires reconnecting again after certain time, and the UX is not very friendly. Facebook and Twitter also have USSD platforms – it’s a very small audience, but a very specific one that could be served.

———

  1. What about preloading content on mobile? What kinds of things can be done technically?

Jorge Vargas: We can preload the Wikipedia app on smartphones and tablets. With the app, we can also preload a file with an offline version of a Wikipedia (ZIM files, built by Kiwix). Ideally, we would be able to preload curated “packages” or “collections”, but this content curation is yet to be explored. WIdeally we could have packages with information on response to natural disasters, for example. The only specific ZIM files that are more content specific are the ones for Wikimed.

Anne Gomez: To build on what Jorge said, we’re learning from our initial research around that feature that people are looking for small, specific content packages to be on their devices, which is something we aren’t currently able to offer. You can see that research linked here under “Research findings.” We’d love to be able to create and offer smaller, more focused packages of files based on a topic or area of interest, in any language and are investigating what that might look like and how we could support our readers and editing community in building exactly what they need.

———

  1. I know you’re also investigating the possibility of changes to the mobile website to support intermittent connection. Can you talk a little bit about how to support users with intermittent connections?

Anne Gomez: Users with intermittent connections exist all over the world. Even in the most connected cities, there are still gaps in coverage. It’s really frustrating when you’re browsing the web waiting for some part of the site to load, the connection drops, and the entire page disappears. Beyond that, we know from our research that some people who are cost-conscious about their data usage open browser tabs when they’re on wifi to read later when they don’t want to pay for internet. We want to support those people.

Olga Vasileva: We have recently begun a project that will allow users to access the mobile website even during intermittent connections.  For example, if their connection is spotty or they leave a wifi zone, they will still be able to read articles within the website – they can hit the back button and access the articles they read before or, tentatively, save articles that they would like to read when offline.  We will also be improving the messaging for users in these circumstances – they will be able to know which portions of the content they can access, as well as which portions of the content are unavailable while the user is offline.  The project will also aim at making the website more cost-friendly to users by focusing on using less data when loading a page.

———

  1. Where can web devs go to learn more about this or stay abreast of what your team is up to?

Jorge Vargas: They can always reach us at globalreach[at]wikimedia[dot]org to learn more about our work and what our team is up to.

Olga Vasileva: Our current projects are listed in the Reading Web Team project page.

Interview by Melody Kramer, Senior Audience Development Manager, Communications
Wikimedia Foundation