HTML Parsing Inefficiencies on Uber Eats<!-- --> | <!-- -->Web Performance Tips

HTML Parsing Inefficiencies on Uber Eats

Like many others throughout the pandemic, I used Uber Eats to safely deliver food while going to restaurants was unavailable.

One day, I decided to examine what the Uber Eats web application is doing at runtime, and I was surprised on what I found: inefficient, high frequency HTML parsing!

In this tip, we'll look at what Uber Eats is doing wrong and we'll discuss an optimization that can be applied to improve user experience.

Reference Traces

I have collected a trace of two scenarios which both manifest the same issue.

  • A full page load trace of the Uber Eats home page.
  • An in-app navigation trace of clicking on the Filter to Pizza button.

Both traces are available from here.

You can import these traces if you want to compare with my analysis yourself.

Identifying the Issue

While the Uber Eats web app's CPU utilization is quite high leading up to the critical frame, the specific area that caught my attention was this section of the page load trace:

A screenshot of the Chromium Profiler with Uber Eats section in view

If you import the page load trace, the time slice to scope your profiler to would be 6072ms to 6622ms

I found this by exploring the long Animation Frame Fired task which is just before the critical frame generation:

A screenshot of the Chromium Profiler with Uber Eats section in view

I used flamegraph shape knowledge to find the region of interest below:

A screenshot of the Chromium Profiler with Uber Eats section in view

Within the spikes, we can see what's taking up time:

A screenshot of the Chromium Profiler with Uber Eats section in view

Note: In the in-app navigation (filter to Pizza) trace these same spikes appear around 5400ms to 5800ms.

Scoping to the Code

Once I found the spikes of interest, I utilized codepath scoping to see what the a function is doing.

What I found is that it's invoking DOMParser.parseFromString API in high frequency:

A screenshot of the Chromium Profiler with Uber Eats section in view

Understanding the Code

While I don't have access to the Uber Eats source code, I can infer what their code is doing based on the minified sources and utilizing the Chromium Debugger to set breakpoints and logpoints.

What I found is that Uber Eats receives visual metadata from their backend and loads it as HTML strings to their frontend to render the cards on the Uber Eats UI.

A screenshot of the Chromium Profiler with Uber Eats section in view

For a standard page load, I see this codepath above invoked around 150 times.

The individual HTML strings are parsed into DOM nodes via parseFromString, which is a standard Browser API.

After a node is parsed, it's added to the DOM via appendChild.

The Problem

On the surface, this doesn't sound like anything CPU intensive:

  1. Receive HTML data from a backend
  2. Parse each entry to a DOM node
  3. Append each to the document via JavaScript

The problem surfaces in how these HTML strings are parsed -- in particular the frequency of invoking parseFromString.

The browser excels at parsing HTML, so why is this case slow?

DOMParser

DOMParser.parseFromString(string, 'text/html') is quite powerful.

It creates a fully structured document object with all HTMLElement nodes included in the tree.

A parsed document resides in-memory -- it's completely separate from the visual document the user is interfacing with.

Consider this snippet:

function parseToDom() {
    const node = '<span style="color:#545454"><img src="https://dkl8of78aprwd.cloudfront.net/ticket@3x.png" width="14" height="14" vertical-align="middle"/></span>';

    const domParser = new DOMParser();

    return domParser.parseFromString(node, 'text/html');
}

parseToDom();

Observe the fully structured document as output:

Output of the parseToDom() function

Performance Implications

Creating a fully-fledged in-memory document object isn't necessarily cheap, and this cost can add up when invoked in high frequency.

If we zoom into a spike in Uber's runtime trace, the profiler even shows the expensive document readystatechange Events that are dispatched for each invocation of parseFromString:

A screenshot of Uber Eats in the Chromium Profiler

By looking at this flamegraph, it's clear that parsing of the HTML (the Blue blocks) is actually quite small (i.e. fast), but there's a lot of extra readystatechange events that are dispatched and are contributing significant overhead.

In fact, the majority of each parseFromString's execution for Uber's small HTML text blocks are actually spent on dispatching readystatechange events.

You can inspect Chromium DOMParser's source code and observe it creates a completely new Document. You can also observe the readystatechange event dispatching in this class as well.

A proposed solution

As readystatechange events are dispatched for each parsed HTML document, frequently creating document objects will ultimately lead to wasted work.

We can simulate what Uber's code is doing right now in a simple example:

const htmlTextStrings = [
    '<span style="color:#545454"><img src="https://dkl8of78aprwd.cloudfront.net/ticket@3x.png" width="14" height="14" vertical-align="middle"/></span>',
    '<span style="color:#545454"><img src="https://dkl8of78aprwd.cloudfront.net/ticket@3x.png" width="14" height="14" vertical-align="middle"/></span>',
    '<span style="color:#545454"><img src="https://dkl8of78aprwd.cloudfront.net/ticket@3x.png" width="14" height="14" vertical-align="middle"/></span>',
    // ... N number of nodes.
];

for (const htmlTextString of htmlTextStrings) {
    const parser = new DOMParser();

    // Creates a new document object (expensive and slow!)
    const parsedNode = parser.parseFromString(htmlTextString, 'text/html');

    // Do something with parsedNode...
}

Instead of creating a new document for each HTML text string to parse, I'd suggest joining all HTML text needed to parse into a single HTML string, and then send that joined string to DOMParser.parseFromString.

const htmlTextStrings = [
    '<span style="color:#545454"><img src="https://dkl8of78aprwd.cloudfront.net/ticket@3x.png" width="14" height="14" vertical-align="middle"/></span>',
    '<span style="color:#545454"><img src="https://dkl8of78aprwd.cloudfront.net/ticket@3x.png" width="14" height="14" vertical-align="middle"/></span>',
    '<span style="color:#545454"><img src="https://dkl8of78aprwd.cloudfront.net/ticket@3x.png" width="14" height="14" vertical-align="middle"/></span>',
    // ... N number of nodes.
];

const htmlTextString = htmlTextStrings.join('');

const parser = new DOMParser();

// This only creates one document, and parses all HTML in one go (fast!)
const parsedHtmlDocument = parser.parseFromString(htmlTextString, 'text/html');

for (const parsedNode of parsedHtmlDocument.body.children) {
     // Do something with parsedNode
}

This approach avoids the multiple readystatechange Event dispatching and also avoids creation of extraneous document objects.

A Live Example

I've setup a live example of parsing 1000 HTML text strings using both approaches. You can check out the example here.

On my machine, using multiple calls to parseFromString results in about 338ms:

A screenshot of the Chromium Profiler with Uber Eats section in view

On the same machine, joining the HTML text strings, and using a single call to parseFromString on the joined string results in about 11ms:

A screenshot of the Chromium Profiler with Uber Eats section in view

Conclusion

The browser is quite efficient at parsing HTML, but creating document objects in high frequency to facilitate that parsing can lead to performance bottlenecks.

Uber Eats could apply the techniques we've discussed and see a sizeable improvement in their time to generate their critical frame!

Consider these tips:

That's all for this tip! Thanks for reading! Discover more similar tips matching Case Study, CPU, and JS Optimization.