This is a work in progress. Everything is subject to change. As I finish new sections, I’ll update this page with the latest changes.
Interested in following along?
What’s in a performance score?
Human vitals vs web vitals
How do you measure someone’s health?
There’s no one metric that can say “this person is healthy”—no ultimate scoring system that paints the full picture. Instead, we have a bunch of different measurements for a bunch of different situations (size, strength, endurance, diet, vision, mental health, etc.).
But when you go to the doctor for a checkup there are a few quick tests they always run. You know the routine: “Let’s take your temperature.” Then they check your blood pressure, heart rate, breathing…
These are your body’s vital signs:1
- Body Temperature (BT)
- Blood Pressure (BP)
- Heart Rate (HR)
- Respiratory Rate (RR)
Vitals give doctors a baseline—a quick, general sense of where you’re at. They help catch health problems early and show progress over time.
In the same way, there are a ton of different ways we can measure web performance, but we always start with a few basic tests.
These are your web vitals:2
- Largest Contentful Paint (LCP)
- First Input Delay (FID)
- Cumulative Layout Shift (CLS)
These three in particular are considered the Core Web Vitals—a generic baseline for measuring web performance.
We’ll cover other web vitals too (things like Time To First Byte and Total Blocking Time) but these three core vitals are the easiest to start with.
Many people’s first experience with performance scores is Google’s PageSpeed Insights.3
When you type in your website’s URL and click Analyze you get a big score at the top with a summary and details below.
People usually notice two things right away:
- Holy crap, my score is terrible!
- Wait, why do I get a different score every time I run it?
Yes, this is normal.
First, Lighthouse (the tool behind PageSpeed Insights) tends to be very conservative with its scoring (and that’s a good thing; more on this later).
Second, web vitals vary just like human vitals. The same way your heart rate and blood pressure vary from moment to moment, so do connection speeds and response times.
Numbers can be confusing and intimidating at first, but if we look at them less as a grade and more as a guide we can use them to point our efforts in the right direction.
Wikipedia vs eBay
Let’s compare two popular websites: Wikipedia and eBay.
Wikipedia’s score is 99 and eBay’s score is 44, but both pass their Core Web Vitals assessment.
Your score estimates how easily the average person can access your site—someone on an average device with an average connection. If you get a 99 that basically means anyone in the world can have a good experience.
But Core Web Vitals aren’t judged by who might visit your site; you’re judged based on who actually does.4
If people with low-end phones and 2G connections in Central Africa are using Wikipedia, but people with high-end phones and WiFi connections in New York are using eBay, both sites can end up passing their assessment—because both sites performed well enough for their audience.
- Fast site + slow environment = good experience
- Slow site + fast environment = good experience
But if we switch those audiences—if the people using Wikipedia try to use eBay—it’s a totally different story. When someone on a low-end 2G smartphone tries to use eBay it’s extremely difficult.
Everyone can use Wikipedia.
That’s where your performance score is trying to guide you: towards a site that everyone can use.
Lab data vs field data
So we have two different ideas of measuring performance: a “score” and an “assessment”.
The score is an indicator, the assessment an observation.
This is the main difference between lab data and field data: lab data estimates how you might perform, field data reports how you actually did.
Lab data is like training in a gym. Field data is when you’re out there performing in the real world.
You never know for sure what the conditions will be like in the real world, but you can prepare for them by adjusting in the gym—by adding more weight, speeding up the treadmill, increasing the incline.
We need both: one to tell us how we’re doing, the other to tell us how to get better.
Lighthouse is like a treadmill, and the Chrome User Experience Report (or CrUX for short) is our track record.
There are other tools and services for collecting performance data, but since these are the most popular ones right now that’s what we’ll focus on first. Just know that the same basic concepts apply to all lab data and field data.
Where does all this data come from?
Let’s start from the ground up.
Sometimes the browser has a built-in API for reporting the exact metric you want (for example, try checking out the
window.performance object in your DevTools console).
Other times, when there isn’t an API, you have to derive your metric somehow (like taking a video of the page load and inspecting each frame, or sifting through the timeline to calculate a value).
For example, try visiting any web page in Chrome, then open DevTools and run this code in the console:
You should see something like
[PerformancePaintTiming] show up. If you click into that object, you’ll find the number of milliseconds it took to render the First Contentful Paint for the page.
Awesome, but is this lab data or field data?
Well, we did get the number straight from the browser, but we also only checked it for a single user in a single environment, which makes it lab data.
The key to field data is getting it from real users in real world environments. Anything else—if you control the environment or simulate it in any way—counts as lab data. Any test you run, with or without Lighthouse, counts as lab data.
The only way to get field data is to track it, to collect measurements from other people’s browsers when they visit your website.
performance.getEntriesByName('first-contentful-paint'), and track the result in a custom event. Then you can run a report to see how quickly your site is actually rendering content for people in the real world.
But what if you’re Google and you own the browser?
With that level of insight, you could collect performance data for the entire Internet. You could even share that data, safely storing it in a giant database for the rest of the world to use.
That’s what the Chrome User Experience Report is: a giant database of anonymous web vitals data for millions of websites from millions of users.
When you run a PageSpeed Insights report and you see those two sections, Field Data and Lab Data, this is where that data comes from:
- Google’s field data comes from the Chrome User Experience Report.
- Google’s lab data comes from Lighthouse.
When we start looking at other tools, you’ll see some services offer “synthetic monitoring” while others offer “real user monitoring” (or RUM). Synthetic tools give you lab data, RUM tools give you field data.
Field data tells you how you’re doing, lab data tells you how to get better.
Understanding the difference between these can help relieve a lot of stress for teams worried about their performance.
Between the two, lab data is the easiest to start with: you can get it immediately, and it’s more stable. Field data requires a live site with active users, plus a way to collect the data, and often varies based on external factors like network, traffic, user behavior, device, and geography (to name only a few). When you test in a controlled environment, it helps rule out a lot of the randomness from real world data, making it much easier to debug problems and measure improvements.
So for now, let’s start simple: we’ll look more closely at lab data, then work our way up to field data.
Tools like Lighthouse can help you collect lab data, and even offer advice on different ways to get started making improvements.
How Lighthouse works
Lighthouse, ˈlītˌhous (n): a tool containing a beacon light to warn or guide developers.5
Imagine you’re training for a race.
Ideally, you’d practice out in the field or on the track where the race will actually happen, but you can’t always get there. The next best thing is to find some place similar close by, but even then you can’t always count on the weather.
Now say you have a treadmill.
You can practice inside. You can adjust the settings to simulate the race: increase your speed, raise the incline, even stagger the pace. It won’t be exactly the same—there may even be some big differences—but it gives you a good idea of what to expect.
Lighthouse is like a treadmill.
It doesn’t give you exact, 100% accurate measurements—that’s not its goal. Instead, it gives you feedback, an approximation to help point you in the right direction.
Think about real lighthouses: they don’t light the entire ocean or give you a detailed map of the shore. No, they just tell you “watch out!” and you use that light as a reference point to steer your ship to safety.
Lighthouse is a guide towards safe levels of performance.
But really, what does Lighthouse do?
Remember how we said some metrics have browser APIs and others have to be derived? Lighthouse does both, and then takes it a step further.
Lighthouse doesn’t simply report performance metrics, it imagines what they could be.
When you run a test, Lighthouse takes a trace of the page load (basically a log of everything that happened) and uses that data to recreate its own simplified model of the timeline.
Then it plays with that model: it stretches, squeezes, and reorders events to simulate what might happen in a bunch of different scenarios. It calculates the best and worst case outcomes for a given environment, then averages out those numbers to give a final estimate for each metric.
By default, it simulates globally average conditions (someone on an average device with an average connection speed), but it can be configured to test other settings too.
Again, the goal isn’t to be comprehensive; it’s to give quick, reliable feedback.
By modeling the page load (instead of simply recording it), Lighthouse can test a whole range of conditions in a matter of seconds. It can imagine how fast your page would be on a slower device with a slower connection speed—even though you’re testing on a high-end device with a high-speed connection. You don’t have to wait for a real device to respond in real time. Instead you can make a change, test its impact, and get results immediately.
To get a feel for how Lighthouse works, let’s try running our own test.
Running a Lighthouse test
Outside of PageSpeed Insights, the easiest way to run your own Lighthouse test is with Chrome’s DevTools.
If you’re on a computer, open Chrome and visit any website, then open DevTools. (You can open DevTools by going to the menu bar at the top of the screen and selecting View > Developer > Developer Tools.)
In DevTools, you’ll see several tabs. Look for the one named Lighthouse and select it.
From here the options are pretty simple: you can select which categories of tests you want to run and which device you want to test for. (Easy, right?)
For now, let’s stick with Performance and uncheck the other categories. Then select Mobile for device and click Generate report to run the test.
As the test runs, you’ll see a small popup with a progress bar while the page flickers in and out behind the scenes. Eventually, everything stops and a report appears.
Congrats, you just ran a Lighthouse test!
But wow, look at all these numbers… Depending on the site, you may see a bunch of red, a bunch of green, some orange, or a mix of everything.
It should. This is the same stuff you see in PageSpeed Insights—that number at the top, the colors, all the metrics you normally see under Lab Data—because Lighthouse is what’s reporting those numbers behind the scenes. (We’ll talk about why your Lighthouse numbers may be different from PageSpeed Insights later. For now, just know that different samples from different machines can produce different results.)
So what does this all mean? What are all these metrics, opportunities, and diagnostics?
What should I pay attention to?
What is “web performance”?
If we’re going to make sense of these reports, we need to understand the terms first. For example, what’s a Largest Contentful Paint and how’s that different from a First Contentful Paint? Or what’s the difference between Time To Interactive and Total Blocking Time?
Each metric tells a different story—or rather, a different part of the bigger story about your page experience.
But what is that bigger story? What’s “web performance” trying to measure in the first place?
To learn that, it’s important for us to step back a minute, to zoom out and take a look at the big picture.
If we learn the general idea of what web performance is all about, then it’ll be much, much easier to understand how all of these other pieces fit together.
So let’s take a brief detour and set the stage for the terms and metrics we’ll be using to analyze web performance…
“Performance” literally means “a thing performed”—it’s an action. It’s something you do.
And measuring performance means measuring that thing you do, whatever it is.
But most of the time, we measure for a reason: we don’t just want to know that thing you did, we want to know how well you did it. So in most cases, performance becomes a measure of quality: how well did you do that thing you do?
For example, cars drive. But we don’t just care about them driving, we care about how fast they go and how far they drive. “High performance” cars are the ones that go really fast and drive really far. We measure their speed, acceleration, range, and more.
But performance isn’t always about speed.
It can be about satisfaction too—or efficiency, profitability, popularity, security, endurance, you name it. Performance is all about that thing you do and the qualities that make it good.
Performance is how well something works.
Kinds of performance
But how do you decide what to measure?
How do you decide what makes something “good”?
After all, you can measure the same thing in all sorts of different ways. Each measurement can be good or bad depending on your situation.
Take cars again: driving really fast can be a competitive advantage, a status symbol, or a waste of money. It all depends on context. We could be driving in a race, showing off to friends, or trying to conserve fuel. Each one calls for a different kind of performance: mechanical, social, or economical.
Performance depends on your goals.
Performance isn’t just about the actions you want to measure, it’s about the job you’re trying to accomplish—the qualities that make it good. It’s not just what you do, it’s what you’re trying to do.
Different kinds of performance help us focus on different goals.
We usually imply these goals by describing the kind of performance we want to measure. Financial performance aims at good finances, business performance aims at good business, and so on.
The kind of performance describes the object of your goals.
So what are you trying to make better?
Web performance aims at good websites. But what makes a good website? How do you decide what to measure?
Goals determine metrics. Your goals help you decide what to measure, how to measure, and how much is good enough.
What’s your business?
What does your website do?
You have a website for a reason. What is it? Why do people come to you?
Whatever it is, people come to you for something. They find value in that thing you do. Providing that value to people is your business, and how well you’re able to continue providing it is your business performance.
How does your website provide value to people?
That’s the heart of web performance. That’s what we’re trying to improve: the things that make your website good—anything your website does to provide value to people, that’s what we want to measure.
This is where web performance begins…
What does web performance measure?
Web performance measures how well your website provides value to people, the qualities that make it good.
Traditionally, web performance has focused on the Web’s most common feature: speed. After all, that’s something we know every website does: they all load.
But websites do so much more!
Since then, we’ve continued to push the boundaries of what the Web can do, from mobile apps to ebooks, to online payments and real-time communication, to 3D graphics and online gaming, to audio editing and speech recognition, to machine learning and virtual reality, to wearables and robotics, to automotive and blockchain, to “Web of things” and smart cities, and beyond.
How do we measure all this?
The trick is measuring what matters most.
Measuring what matters
To be continued…
“The data for the Core Web Vitals report comes from the CrUX report. The CrUX report gathers anonymized metrics about performance times from actual users visiting your URL (called field data).” https://support.google.com/webmasters/answer/9205520 See also: https://developers.google.com/search/blog/2020/05/evaluating-page-experience ↩︎