Home / Web Development / Measure What You Impact, Not What You Influence – Web Performance and Site Speed Consultant

Web Development

Measure What You Impact, Not What You Influence – Web Performance and Site Speed Consultant

24 August, 2022

Table of Contents

Problems When Measuring Performance
Indirect Optimisation
Isolate Your Impact
Signal vs. Noise
Final Thoughts

A thing I see developers do time and time again is make performance-facing
changes to their sites and apps, but mistakes in how they measure them often
lead to incorrect conclusions about the effectiveness of that work. This can go
either way: under- or overestimating the efficacy of those changes. Naturally,
neither is great.

Problems When Measuring Performance

As I see it, there are two main issues when it comes to measuring performance
changes (note, not improvements, but changes) in the lab:

Site-speed is nondeterministic^. I can reload the exact same page
under the exact same network conditions over and over, and I can guarantee
I will not get the exact same, say, DOMContentLoaded each time. There are
myriad reasons for this that I won’t cover here.
Most metrics are not atomic: FCP, for example, isn’t a metric we can
optimise in isolation—it’s a culmination of other more atomic metrics such as
connection overhead, TTFB, and more. Poor FCP is the symptom of many causes,
and it is only these causes that we can actually optimise^{. This is

a subtle but significant distinction.}

In this post, I want to look at ways to help mitigate and work around these
blind spots. We’ll be looking mostly at the latter scenario, but the same
principles will help us with the former. However, in a sentence:

Measure what you impact, not what you influence.

Indirect Optimisation

Something that almost never gets talked about is the indirection involved in
a lot of performance optimisation. For the sake of ease, I’m going to use
Largest Contentful Paint (LCP) as the example.

As noted above, it’s not actually possible to improve certain metrics in their
own right. Instead, we have to optimise some or all of the component parts that
might contribute to a better LCP score, including, but not limited to:

redirects;
TTFB;
the critical path;
self-hosting assets;
image optimisation.

Improving each of these should hopefully chip away at the timings of more
granular events that precede the LCP milestone, but whenever we’re making these
kinds of indirect optimisation, we need to think much more carefully about how
we measure and benchmark ourselves as we work. Not about the ultimate outcome,
LCP, which is a UX metric, but about the technical metrics that we are impacting
directly.

We might hypothesise that reducing the amount of render-blocking CSS should help
improve LCP—and that’s a sensible hypothesis!—but this is where my first point
about atomicity comes in. Trying to proxy the impact of reducing our CSS from
our LCP time leaves us open to a lot of variance and nondeterminism. When we
refreshed, perhaps we hit an outlying, huge first-byte time? What if another
file on the critical path had dropped out of cache and needed fetching from the
network? What if we incurred a DNS lookup this time that we hadn’t the previous
time? Working in this manner requires that all things remain equal, and that
just isn’t something we can guarantee. We can take reasonable measures (always
refresh from a cold cache; throttle to a constant network speed), but we can’t
account for everything.

This is why we need to measure what we impact, not what we influence.

Isolate Your Impact

One of the most useful tools for measuring granular changes as we work is the
User Timing
API. This
allows developers to trivially create high resolution timestamps that can be
used much closer to the metal to measure specific, atomic tasks. For example,
continuing our task to reduce CSS size:

...