MIT develops a new technique to load webpages faster

Webpage bloat is a growing problem. So no surprise techniques to speed page load times have often focused on data compression to try to shrink the number of milliseconds it takes for a website to heave into view.

But researchers at MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) have taken a different tack to try to take some of the tedium out of web browsing — and the result is a tool, called Polaris, which they say can reduce page load times by as much as 34 per cent.

Their technique focuses on mapping the connections (aka ‘dependencies’) between different objects on a page in order to dynamically figure out the most efficient route for a browser to load the various interdependent elements.

And while they note there have been prior attempts to do “dependency-tracking”, they claim theirs is a more “fine grained” mapping of these relationships, whereas they say other methods have focused on comparing lexical relationships via HTML tags and have thus failed to capture “more subtle dependencies”.

“What prior tools have done with their dependency graphs between the objects on the page is made them with respect to how browsers today load the pages,” explains PhD student Ravi Netravali, discussing how CSAIL’s approach differs. “So when you load a page you first get an HTML file. And that lists a lot of different objects and it lists them in a specific order. And that order is what defines how these prior tools view dependencies — so if one object is listed before another then it implies that the browser should fetch that object before it fetches the next object.

“What we found is that’s not necessarily true. That doesn’t actually capture the real dependencies between these objects. And so what we were able to do with Polaris is track, at a finer granularity, how these objects interact. So is one object writing some data that another then reads? Ok well then it’s a dependency. But if they’re totally doing separate things, and they don’t have any shared state, then you should be able to fetch them and handle them in parallel because they don’t depend on one another.”

So why haven’t others thought of creating a more accurate map of webpage object dependencies to improve the efficiency of browser-serve back-and-forth and speed page load times? Netravali reckons the shift to mobile computing is what’s brought this aspect of page load logic into closer focus now — whereas other elements were being focused on for optimizing before.

“Until a few years ago, a lot of people have targeted improving browsers themselves, or making your Javascript engines faster, making your HTML processor faster and so on. And so today browsers like Chrome and Firefox are very heavily optimized. But I think now given the rise of mobile the focus is now shifting towards the fact that these delays — these RTTs [round-trip times] on these cell networks are really making page load times much larger than they should be,” he says.

“When people are primarily browsing on their desktop the cost of going to a server is much lower,” he adds. “On a cell network these times, these magnitudes are quite high — and with the median we’re saving over a second. There’s tonnes of studies in recent years that basically say, from a content provider’s perspective, every millisecond or tens of milliseconds of increases in page load time leads to significant losses in revenue and user-base, of course.”

Polaris needs to be installed on the server, and includes a tool — called Scout — that loads pages locally and extracts all the various dependencies in order to create the dependency graph that Polaris then uses to optimize how webpages are loaded.

“By the time the client request arrives at the server this graph is already computed. Generating the graph is not user triggered,” Netravali notes, although he also confirms that if substantial changes are made to a website then the mapping process would need to be re-run for the load time acceleration to be maintained.

“Today… when somebody updates their webpage, numerous different indices are updated in their servers to make sure that they have content, they index what they had previously so they can always revert and so on, so I think this type of dependency tracking could just be added to that workflow,” he adds.

The researchers tested Polaris across a range of network conditions on 200 of the top ranked websites (via the Alexa list). The 34 per cent reduction figure is a median of these tests. Netravali notes the acceleration will vary, depending on the complexity of the webpages. So very simple pages will not benefit much from the technique but more complex pages — which is of course the ongoing trend — will see larger load time gains.

MIT’s current plan for Polaris is to “eventually” open source it but the more immediate hope is to encourage browser makers to embed the tech.

“A big decision here was to do this in JavaScript,” says Netravali. “It’s a research project but we’re hoping that it gets adopted by some of these more popular commercial browsers. There are perks to them actually putting it in the browser. Because, for example, a browser’s native source code inherently runs faster than JavaScript.

“There’s sort of pros and cons to each, doing JavaScript in the browser. So for now we will open source it but our end goal is absolutely to have one of these major browsers adopt it. And of course web servers doing the same.”