Photo by USGS Bee Inventory and Monitoring Lab, public domain.

Late last month, Jon Robson, a software developer at the Wikimedia Foundation, pinged me about a performance improvement that his team had noticed. On our mobile site, large wiki articles appeared to load faster in our testing environment, according to WebPageTest, a web performance tool that we use to measure how fast above-the-fold content is displayed.
You can see the way the improvement looked in the image below, which shows a sudden drop in page load time. (This is called the SpeedIndex, where a lower number indicates a faster load time.)

Pat Meenan, the author of WebPageTest, describes SpeedIndex as the “average time at which visible parts of the page are displayed. It is expressed in milliseconds and dependent on size of the view port.”
The actual mathematical definition is a bit more complicated, but essentially it captures how fast an end user sees content above-the-fold for a given screen size and internet connection speed. We run different profiles in WebPageTest representing different kinds of devices and internet speeds, because sometimes performance changes only affect some devices or some types of connection speeds or some types of wiki pages, even.
In the case of the WebPageTest test spotted by Robson, the drop in page load time was taking place on mobile phone-sized screens using a simulated 3G network.
It was time for an internal investigation: why was this happening? What was the root cause?
As we do for every suspected performance improvement or regression from an unknown cause, we filed a task for it, tagged the Performance-Team, and began digging a little deeper. (If you ever come across something odd related to performance, you can create a task directly and tag the Performance-Team—which will get our attention.)

Comparing synthetic testing and real user metrics

When a change like this happens in a testing environment, we first verify whether or not a similar change can be seen in our real user metrics. Specifically, we look at Navigation Timing in Grafana, which indicates the loading milestones the browser reports.
This is because we can’t use SpeedIndex to measure the page load time of real users. Our real user metrics are limited by the JavaScript APIs available in browser to measure page load speed, which are very basic compared to what WepPageTest can do in a testing environment. There’s no way for us to tell, for example, when people can see all images and text above the fold from the client-side code.
We are able, however, to understand when a web browser starts publishing anything on a page. To do this, we use firstPaint, a simple metric reported by some browsers that tells us the point in time when the browser starts painting the page. Though firstPaint measures a different metric than SpeedIndex, there’s an important overlap in the way they measure page statuses: they use the same timeline. In other words, they start measuring what happens on a page at the same start time. This also means it’s common for any a SpeedIndex change in the testing environment to come with a simultaneous variation in real user metrics like firstPaint. When this happens, it makes our investigation easier because we know it’s not an issue in our telemetry, but a real effect. (When there’s no correlation between synthetic testing metrics and real user metrics, we try different tactics.)
This fundamental difference also means that some performance improvements can improve the site loading time on SpeedIndex, while not changing any metrics on firstPaint or any Navigation Timing metrics. In these cases, we know performance has improved, but we can’t measure how much site load time improved in the real world for people browsing Wikipedia.
This is exactly what happened in this mysterious incident: SpeedIndex metrics improved, but real user metrics didn’t. That doesn’t mean that the site didn’t load faster for our real users—but it’s necessary to understand that Navigation Timing, which measures some milestones in the page load, is only a partial view of performance, and that we can’t always measure performance changes using real user data.

Comparing WebPageTest runs

The next logical step in our investigation was to compare how the page performed on WebPageTest both before and after the performance change. You can see our synthetic tests, which run continuously on our public WebPageTest instance. Here are the steps:
First you want to click on the test history section, which brings you to this view:

Next, click on the show tests from all users checkbox. You should now see all our test runs:

We continuously test a number of pages for the desktop and mobile site, using various simulated internet connection speeds and other metrics. Finding the tests you’re interested with in this historical view requires some manual labour, because you need to manually search for the labels you’re interested in, as the search box only applies to the URL.
WebPageTest does supports a great feature to compare different runs from the history view, but won’t get into that here, though, as the difference in speed is visible from the screenshot of the runs alone. After combing through the history view, I found two runs of the same test. It was for loading the Sweden article on the English Wikipedia, while browsing the mobile site on Chrome with a simulated 3G connection before and after the SpeedIndex drop.
Here’s how it looked before:

And here’s how it looked after:

Notice any difference?
It’s obvious that the content above the fold has changed. The new version displays mostly text above the fold, whereas the old version also contains images. This explains the SpeedIndex improvement: it’s faster to load text than an image, which means that users get content they can consume above-the-fold faster. This is more dramatic on slow connections, which is why this performance improvement showed up on our synthetic testing that simulated a 3G connection.

But was this a deliberate or accidental change?

The next part of the investigation was to determine whether that was an accidental change, or a deliberate one. The first place we examined was the Wikimedia Server Admin Log. Whenever changes are deployed to Wikimedia’s production servers, log entries are added there. Deployments can be individual patches or our weekly deployment train. This part of the investigation is simple: we simply went through the log, looking for anything that happened around the time the performance change happened.
And sure enough, we found this log entry around the time of the performance change:

18:31 thcipriani@tin: Synchronized wmf-config/InitialiseSettings.php: SWAT: mobileFrontend: Move first paragraph before infobox T150325 (duration: 00m 41s)

The task quoted in that log entry, T150325: Move first paragraph before infobox on stable, is a deliberate change to improve the user experience by showing the first section of an article at the top rather than the infobox. While making this change, Foundation engineer Sam Smith (@phuedx) also improved the performance for users on slow internet connections. They will now see the first section of an article above the fold, which they can start reading, instead of a mostly empty infobox whose images are still loading.
So our mystery has been solved, and users on slower connections saw a performance improvement, as well. As for me, well, I put down my deerstalker, calabash pipe, and magnifying glass until the next whodunit and headed back to improving performance on Wikimedia projects.
Gilles Dubuc, Senior Software Engineer (Contractor), Performance
Wikimedia Foundation

Want to work on cool projects like this? See our current openings.

Can you help us translate this article?

In order for this article to reach as many people as possible we would like your help. Can you translate this article to get the message out?

1 Comment

Inline Feedbacks

View all comments

Sharun

6 years ago

#26675

This is interesting. I have been making changes to the kiwix-html5 project for offline wikipedia access. If I want to performance benchmark it periodically, to see performance across various platforms and a standard set of pages, what would be a simple(recommended) way to set this up?

Diff

Investigating a mysterious performance improvement

Comparing synthetic testing and real user metrics

Comparing WebPageTest runs

But was this a deliberate or accidental change?

Can you help us translate this article?

Related

Welcome to Diff

Subscribe to Diff via Email

Wiki Loves Africa – Africa Creates

#EveryBookItsReader

Election for the Universal Code of Conduct Coordinating Committee (U4C)

Wikimedia Foundation News

Wikimedia Technology Blog

Down the Rabbit Hole

	This comment is spam
	This comment is a violation of the Code of Conduct
	Other