We Love Speed 2021

Today in Lyon, France, was the We Love Speed conference. Its focus is on everything related to web performance. Even if the conference talks were only in French, I'll do this recap in English, to let more people learn from it. I took a lot of notes while attending the talks, directly in Markdown format, and now I'm editing them, during my 4h30 train ride back home. If you're looking for a more high level overview of the state of webperf in 2021, I wrote about it on the Contentsquare blog. This post is more of a heavy recap of each talk.

How to optimize 40k sites at once

This was a presentation by PagesJaunes, the french version of YellowPages. Their brand used to be a big thing here in France; before the Internet ever existed. Those yellow pages were the only way to find a professional service in your area of living. Now, they've totally embraced the web and have created a spin-off organization called Solocal.

Solocal is a web agency that specialized in helping the online presence of SMBs by offering a package containing the development of a dedicated website, SEO, social media and ad presence as well as some advanced premium features (like a store locator) on demand. Most of their customers have less than 10 employees, are not tech savvy and don't really know how to use a website anyway, but they know that without one, they won't get any customers in today's world.

Most websites created by Solocal follow some dedicated templates (custom design is a premium feature). And because webperf has an impact on SEO, they had to improve the perf of their templates. Every change they made had a direct impact on thousands of websites because of the templating system.

First talk of the day, and nice (albeit a bit long) introduction as to why webperf is important. This really was wetting my appetite to know more about what they do.

Unfortunately, the next part of the talk was supposed to be done by the CTO, which couldn't make it to the conference and recorded a video instead. This was a last minute change to the program, and the conference team didn't had time to properly setup the sound, so it was really hard to understand what he was saying. Anytime someone moved, the floor was creaking louder than the video sound.

I had to leave the room after 10mn of trying to understand the content. I figured my time would be better spent elsewhere, so I went downstairs discussing with a few people I met (hello Alex and Guillaume). I hope the final recording will allow us to know more about the tech impact.

How to create a webperf culture in both dev and product

The second talk was much better; it's ranked my second favorite of the day. It was presented by two people from leboncoin (the french equivalent of craigslist), one from the product team and one from the dev team.

Leboncoin is a pretty large company now, about 1400 employees; 400 of them in the tech team. It grew significantly in the past years, the tech team almost doubling in the last two years. Today, they have about 50 feature teams, handling about 30M unique visitors per month. Scaling the tech team and keeping that many people organised and synchronized is actually one of their main challenges today.

But back to webperfs. Leboncoin started investing a lot in it because of a large perf regression in production they had in 2020. Their homepage was 7s slower than it used to be. They didn't caught it initially (they had no perf monitoring). It's because their own customers and partners started complaining that they realized something was not working properly. And when they saw that it had a direct impact on their revenue, they tackled the issue by setting up a taskforce to remediate the regression.

The taskforce was made of experts from their various domains (search, ad, authentication, product details, etc). They also requested help from Google and Jean-Pierre Vincent (a webperf consultant, also a speaker at We Love Speed).

They extracted a list of 40 things they should work on fixing. As they couldn't fix them all, they knew they had to prioritise them, but where not sure how to do so.

So they started identifying who their median user is, so they could optimize for the median user. Turns out their median user is using a Galaxy S7 on a poor connection with high latency. This was a defining information for them; they knew they had to optimize for mobile display (for a phone that was already 5 years old) on a slow network.

Leboncoin's motto is all about "giving power to people to better live their day to day lives", by buying second hand stuff. So they couldn't really tell their users to "get a better phone". They had to make their website work for slow low end devices. So they took the most important item in their list, deployed a fixed for it, analyzed the performance. Then they went to the second item, deployed a fix for it, and analyzed again. And they went down their list like this until the initial 7s regression was fixed. They even went a bit further.

But they realized that it was a one-shot fix. If they didn't invest in long term performance tracking and fixing, they will have to do it all over again in 6 months. Performance optimization is not a sprint, it's a marathon and you have to continually monitor it. Which is what the did. They started by adding some live performance monitoring, logging the results in Datadog and sending a Slack alert in relevant channels when one metric was above a defined threshold. It did not prevent pushing slow code to production, but at least they had the history and alerts when things went wrong. They monitored only the most important pages (homepage, search results and details), and measured on different devices.

The second step was to be able to catch performance regression before they hit production. They added a check on the bundle size of their JavaScript. This metric is pretty easy to get, and they pluggued this to their GitHub issues, so whenever the bundlesize difference is too large (> 10%) between the PR and the current code; the PR cannot be merged. Again, they tracked the change overtime to have the history.

They also added automated Lighthouse tests in their CI. Lighthouse is not a perfect tool, and its score shouldn't be taken as an absolute truth. Depending on your stack and use case, some metrics are more important than others. Still, it's an invaluable tool to make sure everybody in the team can talk about the same thing. Without this data, it would just have been another opinion. They added thresholds on some of those metrics in the same vein as the bundlesize limit: it it goes too far above a threshold; the PR is blocked. This forced developer, designers and product owners to discuss the decisions, with an objective metric.

The next step was to teach people internally about all those metrics. What they mean and why they are important. They created a set of slides to explain each metric, to each internal audience. For example, they had one talk to explain why the LCP (Largest Contentful Paint) is important to a designer, and how to keep it low. But also the same talk to explain it to a product manager, or developer, with a specific explanation and examples so everybody knows why it's important, even if they don't care about them for the same reasons. That way, all teams had a shared goal and not opposite objectives.

And the last step is to always keep one step ahead. The webperf field evolves more and more rapidly; there are new elements to learn every few months. Browsers ships new features that could help or hinder webperf, and people need to be kept up to date with them.

Overall, this whole production issue turned into a taskforce that finally turned into a whole company-wide shift. Webperf talks are common, both by developers, designers and product people. Their team keeps up to date with the latest news in the webperf world, they closely follow what Google or Vercel is doing, and those metrics became KPIs that everybody can understand.

Still, even with all those progress they know that they are not perfect, and they have to optimize for some metrics instead of others. When making one score better, they might make another worse. They're aware of that, but because they have defined which metrics are more important, they can usually define if the tradeoff is acceptable.

Why you need a markup expert

Jean-Pierre Vincent then presented one of the most technical talks of the day. Jumping right into it, he picked the example of a homepage with a hero image, some text and a CTA and showed how, in 2021, this could be optimized. The goal is to make sure it delivers its message as fast as possible, even on a low end mobile device with a slow connection.

His talk is pretty hard to recap, because of the amount of (French-only) jokes in it, and the way it was swinging from high-level meta considerations into deep browser-specific hacks. The crux is that JavaScript never solves webperf issues. It only creates them. Sure, it comes with solutions to cancel the issues it creates, making it neutral, but it will never make your pages load faster, no matter how optimized it is. If you really want to gain in performance, you have to invest in the underlying fundamental standards: HTML and CSS.

We should strive to develop every single component with a "good enough" version that works without JavaScript. Instead of having either nothing or an empty grey block while waiting for the JS to load, we could at least have a MVP without all the bells and whistles, but that at least look like the full component, and act partly like one. He gave the example of a slideshow (carousel) component. With standard HTML and CSS, it is possible to have something that already works pretty well without a single byte of JavaScript.

In real life, we barely have 1% of our users that navigate without JavaScript. A very small proportion manually disables JavaScript, the majority of those 1% are people either behind a corporate proxy that wrongfully blocks scripts, or people on a poor connection that don't yet have their JavaScript downloaded. If you're a small startup, there is no incentive in optimizing for 1% of your users. But if you're a large company, 1% of your users can be hundred of thousands of dollars of revenue. In any case, forcing yourself to build for those 1% without JavaScript will only make you create faster components.

It doesn't mean you need to build for the smaller common denominator and serve this pure HTML/CSS version to all your users. But it should at least be the version they can interact with while your JavaScript is loading. Take for example a datepicker. Datepickers are incredibly useful components that many site needs. But the amount of JavaScript required for it to work properly (handling date formatting in itself is a complex topic) is often quite large. What about using the default, standard, HTML datepicker provided by the browser. And load the full fledged datepicker only when the user will need it (for example on focus of the actual field). That way the initial page load is fast, and the datepicker is required only when it is needed.

Jean-Pierre then moved onto explaining the best ways to load an image if we want it to be displayed quickly. An image is worth a thousand words, as the saying goes, and it is even truer on homepage where nobody will read your text. So you need to have your main image displayed as fast as possible. He warned us about not using background-image in CSS for that (even if it comes with the nifty background-cover properties) because some browser will put background images at the bottom of their download priority list, preferring regular <img> tags instead. Modern browsers now have object-fit that is similar to backround-cover but for real image tags. For older browsers, you can hack your way around by adding a fake <img> tag referencing the same image as the one in background-image and adding a display:none.

On Chrome the problem is even more complex as the image download priority is calculated based on the image visibility. If it is not in the viewport, it will be put at the bottom of the priority list. This seems pretty clever, but the drawback is that the browsers needs to know the image position in the page before downloading them, so it needs to download the CSS before it downloads the images. The suggested way around this limitation if you really need to download one image as fast as possible is to add a <link rel="preload"> tag for this image. Preloading is a very interesting concept, but once again we have to be careful not to use it for everything. If we mark all our images for preloading it's like we're not preloading anything.

Once we know how to download an image as soon as possible, we have to make sure to download the smallest viable image. The srcset attribute allow us to define (in addition to the default src attribute) specific images to load based on the current image display size. The syntax is very verbose and can quickly turn pretty complex to maintain as picking the right image depends on three factors: the current screen resolution, the Device Pixel Ratio (retina or not) and the relative size of the image compared to the page. The last two are tricky because screens are getting higher and higher pixel ratio (3 or more) and the relative size of the image is linked to your RWD breakpoints. This creates a larger and larger number of combinations, making this whole syntax harder and harder to write manually.

Still, because this is a standard syntax, directly at the HTML level, it will work on every browser (eventually) and will be much better than any JavaScript-based solution (or even CSS-based solution for that matter).

The last advice he gave us on images was to make sure we are not uselessly downloading an image that is not going to be display (because we don't display them on mobile for example). As usual, the fastest way to transfer bytes on the wire is to not transfer them at all, so if an asset is not going to be used, it should not be sent. But if you have a lot of images to be displayed, you need to ensure you're giving them width and height dimensions in your HTML markup, so at least their respective space in the layout is reserved and the page does not jump as images are downloaded.

Speaking of lazy loading images, there is no clear answer if we should be using gray placeholders, blurry version of the images, a spinner or a brand logo while waiting for an image to load. There is no one size fits all solution, it all depends on the use case, the specific page and the other images around. This question needs to be answered by the design team, not the devs.

There was a lot of content packed into this talk, I would highly suggest you have a look at the recording and the slides (or even book a private consultant gig with him) because I can't make it justice.

Still, the last topic he addressed was the font loading. The best way to load fonts being to define a series of fallbacks, from the best to the worst. The best being the font being already installed locally on the user computer, the worst being an old TTF/OTF format to be downloaded online.

Then there is the question of font swapping: if the fonts needs to be downloaded, you should at least present the text in a fallback font while the font is loading. If your default font and real font are really similar, the swap will be almost imperceptible. If they are very different, the swap could create a noticeable jump (if the real font has larger/smaller letters, it could make buttons appear on two lines after the swap for example). In that case, the suggested trick is to scale the default font up/down so it takes roughly the same size as the final font. That way the swap will seem less brutal.

All those examples were highly interesting, but they will also most probably be outdated in one year or two. The main important thing to remember here is that we need to invest into markup specialists, people that know the underlying HTML and CSS properties, keep up to date with the way they evolve and are integrated by browsers. Knowing all those properties and keeping up to date is a full time job, and you can't expect a frontend engineer to be able to juggle all that information while also keeping up to date with the JavaScript ecosystem (that is evolving at least as fast). It's time we better recognize markup specialists as experts, and what they bring to the webperf front.

Micro-frontends and their impact on webperf at Leroy-Merlin

This one was the most impressive talk of the day. How Leroy-Merlin (5th french e-commerce website) rewrote their whole front into a micro-frontend architecture and what the impact on webperf was.

For a bit of context, Leroy-Merlin has 150 physical stores, they do a mix of online and physical business while most of their competitors are pure players (like ManoMano, which was actually doing a talk in the same room right after this one). But back to Leroy-Merlin: their traffic is mostly (55%) coming from mobile, and the average user journey is 7 pages long. This is going to become important data for the rest of the talk.

The two speakers were tech leads of the front. They were upfront about the KPIs they wanted to optimize: great SEO, quick Time to Market (ability to release new features quickly), fast performances, data freshness and resiliency. The quality/price/availability of the products in store isn't part of their scope. They need to make sure the website loads fast and displays relevant information no matter the conditions.

Before their rewrite, they used to have one large monolith and a dev team of a bit more than 100 devs. This created a lot of friction in their deployments as everybody had to wait in a queue for releasing their part of the code. Their webperf was good, but they had to manually deploy their servers and had some issues with their load balancer (sticky sessions that dropped customers when a server was down).

Individually those problems weren't too bad. But all together, it meant it was time to restart from scratch and think of a solution that would fix all those problems at once: automated deployments, stateless machines and autonomous teams. For the infra part they embraced the Infrastructure as Code with Docker, and for the front went with a micro frontend architecture, where each page is split into "fragments". They have one fragment for the navigation bar, one for the "add to cart" button, one for the similar items, one for calculating the number of items in stock, etc. Each fragment is owned by a different team (made up of front/back engineers, product owner, manager and designer).

Each team can then pick the best stack for their specific job. The most complex components are made in React (about 5% of them), while the vast majority are made of Vanilla JavaScript. Because they split a large page into smaller, simpler, components they didn't need a heavy framework. Each fragment was doing one simple thing which allowed them to heavily simplify the complexity of their code, leading to a much better Time to Market. Each fragment being like a self-contained component, along with assets and specific logic, it's also easier to remove dead code than when it's sprawled over the whole codebase.

They have a backend UI tool that let them build custom pages by drag'n'dropping fragments (which is also securely saved as YAML configuration files, so they can redeploy with confidence). The final page is then assembled in the backend when requested. It picks the page template (homepage, listing, or product detail), and replaces the 30 or so fragment placeholders with the corresponding fragment code. This fully assembled page is then sent to the browser and kept in cache for future request. Thus, the backend job is also heavily simplified. It mostly does templating work, once again reducing the complexity.

One limitation of such an architecture is that any personalization data (current user, number of items in cart, availability of a product) cannot be served directly by the backend, and has to be fetched by the frontend. But because 99% of the page has already been pre-rendered on the server, fetching those data requires only a minimal amount of JS and is quickly executed in the front-end. Because their average user journey is 7 pages long, they decided that it wasn't worth downloading a full JavaScript framework for only 7 pages and so they try to really do most of the stuff in vanilla JavaScript.

But, these choices create another limitation. Because each fragment is isolated, it means that code is often duplicated. And because no framework is used, it means that all the fancy tooling and helpers that improves the Developer Experience are missing. Also, coding without a framework proved to make hiring harder. For all those reasons, they extracted some of the most common shared components into their own private modules (like the design system, the API connection layer, polyfills, etc) into their own private npm module that each fragment can import. For isolating CSS rules, they prefix each CSS selector with the unique ID of the matching fragment.

Having the full page being split into smaller chunks also allowed them to increase their resilience. They could define which fragments are considered primary or secondary. A primary fragment is needed for the page to work (like display the product, or the "add to cart" button). If this fragment fails to build, for whatever reason, then the page needs to fail loading. On the other hand, secondary fragments (like a "similar item" carousel, or the page footer) are considered secondary and if they fail loading, they are simply ignored and removed from the markup. This allowed them to be more resilient to errors, and better scale in case of high traffic spikes. They went even further and made the secondary fragment lazyload: their JavaScript is loaded only when the fragment is about to enter the viewport, making the first page load really fast.

But that's not all, and they went even further with their caching mechanism. As we've seen above, they cache the backend response of the build pages. But what if the page layout changes? What if a product is no longer in stock and the layout must be completely changed? They couldn't use revved urls because they wanted to keep a good SEO and unique URLs. They also didn't want to introduce a TTL because it would have drastically increased the complexity of handling the cache.

Instead, they opted for a reactive approach with a low TTL. Every page is cached in the browser for a short amount of time (I don't remember if they said the exact value, but I expect 1 or 2 seconds). This is low enough so a regular user won't notice, but high enough that thousands of users on Black Friday pressing F5 won't kill the server. But the same page is cached in the server forever.

The very clever and tricky part is that they update their server cache whenever their database is updated. They listen to any change in their config database, and if a change requires a cached page to be regenerated, they regenerate it asynchronously. That way users still have fresh data, but the server isn't under a lot of pressure.

In addition to all that, they even have different pages generated based on the User-Agent. A modern browser won't have all the polyfills added, while an old one might have. Some goes for mobiles that might not require some part of the markup/assets, so they are skipped during the page creation, once again for faster load.

See, I told you it was the most impressive talk of the day! They went very far into the micro-frontend direction, and even beyond, taking full advantage of what its modularization approach made possible. This full rewrite required synchronization of the data, front, back and infra teams and also a full reorganization of the feature teams. This went far beyond a tech project, and had impact on the whole company organization.

SpartacUX, ManoMano's rewrite to micro-frontend

The next conference was pretty similar to the one presented by Leroy-Merlin. This time it was ManoMano, actually one of their competitor, explaining a similar approach they had. Both talks being one after the other, we couldn't help but compare to what we just saw in the talk behind. ManoMano's infrastructure is pretty impressive as well, but Leroy-Merlin went so far ahead it was hard to be as excited about this second talk as I was for the first one. There was also a lot of overlap with what Le Bon Coin presented earlier in the morning about how they track webperf stats in their PR and dashboards.

ManoMano started as a Symfony backend with Vanilla JavaScript. They had trouble recruiting Vanilla JS developers, so they moved the front to React. This hurt their SEO as their SSR wasn't properly working with React. They also still had the previous monolith as the backend, and felt like they were duplicating code on both ends, that their performance was getting even worse, and people in the team were struggling with the new complexity to orchestrate.

So they started the really cleverly named SPArtacUX project. A way to bridge the Single Page Application with a better User Experience. The goal was to have a simple codebase for the dev team, while transferring as few bytes as possible, for faster rendering. They opted for micro-frontend architecture (I see a trend here), using Next.js (I see a trend here as well) because it offered nice SSR and they were already proficient with React. They moved to TypeScript for type robustness and used Sass for CSS. As a side note, I still don't really understand why so many companies keep using Sass for their CSS stack (it's slow, it leaks styles, it's non-standard; Tailwind would be a better choice IMO, especially when you already have a design system).

They also started measuring Web Vitals and bundle size in all their production releases and Pull Requests. They pluggued Lighthouse, WebPageTest, Webpack Bundle Analyzer and Chrome Dev Tools to their CI to feed their Datadog dashboard and static reports. When they had enough data to see a trend, they started to optimize. Their first target were the third part tracking scripts that were heavily slowing the page down. Those tags are very hard to remove because they can have a business impact; you cannot remove too much data otherwise you're blind to how your business is performing. They had to get an exhaustive list of everything that was loaded and remove the ones that were no longer used.

Then they had to rewrite a fair number of their components that they thought were responsive, but were actually downloading both a desktop and mobile version and hiding one of the two based on the current devices. This made a lot of HTML/CSS and even sometimes images to download for not even displaying it. They put a CDN in front of all their pages. Just like Leroy-Merlin, they build the pages based on a layout and placeholders to replace with fragments.

They pay special attention at optimizing the loading order of assets, only loading assets that are in the current viewport, lazy loading anything else. They invested a lot of time into code splitting and tree shaking to only load what they really needed in their final build. They also made sure any inline SVG icon asset was only included once, and the other icons were referencing the first one, avoiding downloading several times the same heavy SVG icon.

In conclusion, they did a really good job on their rewrite, a bit like a mix of Leroy-Merlin on their micro-frontend split and Le Bon Coin on their webperf automation monitoring; but it felt like I had already seen that today. I'm sure if I would have seen this talk first, I would have been more ecstatic about it.

What is faster than a SPA? No SPA.

The last talk of the day was by Anthony Ricaud, who made a clean and concise debunking of the myth that SPA are inherently faster because they need to only load the diff that changes between two pages. Because he was going against what is a commonly accepted idea, he had to put us in the right mindset first by reminding us of cognitive bias and rhetorical techniques we're all guilty of.

Then he showed, with many example recordings (of actual websites we had seen during the day), how a version without SPA (so, with simpler GET requests to a server) was actually faster. The reasoning is pretty simple, and went with what Jean-Pierre Vincent said earlier: JavaScript will never make your pages faster; at best it will offset its slowness.

The main reasons for that are that with a SPA, you need to download a lot of blocking JavaScript which you don't need with classical HTTP navigation. Also with a SPA, you need to get a JSON representation of your state, transform it into a VDOM, then update the existing DOM. With classical HTTP navigation you can start rendering the DOM on the fly, while you're actually still downloading it through HTTP.

In addition, when doing classical HTTP navigation, your browser UI will let you know if the page is loading, while with a SPA it's up to the SPA to have its own loading indicators (which they usually don't have, or trigger too late). This tied well with what Leroy-Merlin was saying earlier in that for 95% of their fragments, they use pure Vanilla JS, and with Jean-Pierre Vincent once again in that you can already do a lot with pure standard HTML/CSS and JavaScript will only be needed for progressive enhancement.

He then went on doing a demo of HOTWire (HTML Over The Wire), which is an hybrid way that should take the best of both worlds. It would use a limited amount of JavaScript, plugging itself on standard HTML markup, to only refresh part of a page in an obstrusive manner. The idea is to tag parts of our HTML pages with tags indicating that an area should be updated without the whole page being refreshed. The minimal JavaScript framework would then query asynchronously the new page; the server would return an HTML version of the new page, filter only the area it needs to update and swap the old area with the new one in the current page.

To be honest, the idea seems interesting, but the syntax seemed to be a bit too verbose and still a bit uncommon. Made me think of Alpine.js which follows a similar pattern of annotating HTML markup with custom attributes, to streamline JavaScript interaction with it. I'm still unsure if this is a good idea or not; it reminds me of Angular going fully in that direction and it didn't really went well for them, it created an intermediate layer of "almost HTML".

Conclusion

I'm really glad I could attend physically this event. It has been too long since I could go to conferences because of the COVID situation. Having a full day of webperf peeps sharing their discoveries, and seeing how far the webperf field went in the past years has been really exciting. It's no longer a field only for deep tech people passionate about shaving off a few ms here and there, it has now a proven direct impact on SEO, revenue, trust and team organization.

Thanks again to all the organizers, speakers and sponsors for making such an event possible!


Tags : #webperf

Want to add something ? Feel free to get in touch on Twitter : @pixelastic