Behind the scenes

Black Friday Week: our servers are ready to rumble

Noah Waldner
22.11.2022
Translation: Katherine Martin
Co-author: Norina Brun
Pictures: Noah Waldner

Black Friday doesn’t just get shopaholics’ pulses racing. It’s also quite the thrill ride for Digitec Galaxus’ software engineering crews. The goal is clear: to get our shop ready for a flood of orders, with prep work beginning as early as summer.

Before joining Digitec Galaxus this autumn as a Software Engineer, I’d always wondered what went on behind the scenes during big sales like Black Friday and Cyber Monday. This year, I get to satisfy my curiosity – and take you all on a guided tour through each stage of the preparations. After all, as a developer on the team responsible for our Community, I’m right in the middle of the action.

November in July

It all starts on a hot day in July. Members of the organising committee for this year’s discount days sit down together to make an initial forecast. A number of questions arise: how many customers are we expecting? What preparations do we need to make? What kind of special scenarios do the teams need to prepare for? What are the risks? The results of these analyses help Product Development (our in-house software developers) to make your Black Friday shopping experience as smooth as possible.

In the months that follow, Category Management give it their all to track down the best offers. The hotly sought-after products obviously have to be at the warehouse in time for the sales. At the same time, our logistics crew puts our special offers into the automated warehouse system as a precaution. This way, when you place an order, the product gets to the packing area more quickly and is soon on its way to you.

More server, more performance

Our shop isn’t held on some kind of tin box in our office. We host virtually all of our shop systems in a so-called Kubernetes cluster on the Microsoft Azure Cloud. Simply put, the Kubernetes cluster consists of a number of virtual servers also known as nodes. Our systems run on these servers. Standard configuration does the job for 358 days of the year. During Black Friday Week, however, we expect a massive amount of traffic to hit our site. That’s why we expand the cluster with extra nodes for peak days, allowing us to scale up our systems. The process is efficient and environmentally friendly, because the servers can be rented out to other Azure Cloud users after the sales, instead of lying around unused.

But it’s not just extra servers that are in demand. Our developer teams need to get the features in their area of responsibility ready for the heavy load. For me, this involves the loading and addition of comments or ratings. To do this, we use a tool capable of simulating a high load in order to see how our shop will cope in this scenario. The results and expected user numbers show us where bottlenecks are occurring in the system. Shortly before the week of special offers kicks off, we go on a shopping spree, renting the additional nodes so we’re prepared for the big rush.

What we’re doing differently this year

Last year, just after midnight on Black Friday, our shop crashed briefly. To get to the bottom of what caused the issue, we need to cast an eye over the structure of our shop. The Digitec Galaxus platform is divided into several parts. What you see on your screen is the so-called front end, which is also the part I work on every day. For this part, we use React JS, server-side rendering with Next JS, styled components and Apollo for network requests. Then there’s the GraphQL middleware, which receives all the front-end requests, forwards them to the relevant back-end systems and returns the responses in the format requested by the front end. The back-end systems provide the magic behind the scenes, ensuring that every unit of data is saved in the right place and can be exported quickly.

The Community Team is all set for Black Friday Week.
The Community Team is all set for Black Friday Week.

Between the back ends and the GraphQL middleware, we use a Redis cache to store frequently requested data, such as product data. This reduces the load on our databases (MongoDB, SQL Server) and other systems.

Last year, our GraphicQL system scaled up to the point that it was running on 2,000 virtual servers at once, making tens of thousands of queries per second on the Redis cache. At a certain point, the cache could no longer process the requests, and errors occurred. On top of that, there was an application error, which endlessly repeated these same requests. Again, this multiplied the load, bringing the cache to its knees. The consequence of this was numerous database queries, which made the shop extremely slow and caused it to go down briefly. Of course, we’re not going to put up with that. Since this incident, we’ve revised our caching strategy and now rely on a multi-level cache.

We now use an LRU in-memory cache, which is where our most-visited products are cached. This helps to mitigate spikes in requests sent to our Redis cache. We also fixed last year’s application error, ensuring that any requests containing errors are only retried once. This will help us to avoid overloading the Redis cache.

Since Black Friday and Cyber Monday 2021, we’ve also revised and simplified the back-end system for «special deals». We won’t just use this functionality for Black Friday Week – we’ll also use it for the daily deals we offer you throughout the year. By simplifying the system, we hope to have fewer speed issues on the special offer pages.

What we’re doing on Black Friday

So that’s the theory. On Black Friday, it’ll be time to put it into practice. Shortly before the sales start, the teams manually scale up their systems, such as caches or databases, to previously defined sizes. For Black Friday, we want to play it safe and manually intervene right at these critical points so that enough servers are available to tackle the initial post-midnight wave. For the other parts, our system automatically creates more capacity if there are bottlenecks. With that, our servers are ready to rumble.

Shop functions that aren’t essential for Black Friday will be switched off using «feature flags». The live feed will temporarily stop updating you on who has ordered what from where. We’ll also stop displaying recommended products and magazine articles aimed at the Community.

We monitor the speed of the shop around the clock.
We monitor the speed of the shop around the clock.

We enter the special offers into our internal system as promotional campaigns. In addition, we define how long the offers are valid for and determine the products’ availability. After that, we let the system do its thing. It displays the special offers on the shop automatically and greys them out when they’re no longer available.

Another thing we can’t do without during the week of special offers is our continuous deployment. There are no «code freezes» here. So, if a developer like me wants to release a change in the versioning system and another employee confirms it in the affected part of the shop, you as customers will see the change within a few minutes. Here, we rely on automated tests that run before every release, as well as our engineering crews taking responsibility for them.

On Tuesday night (our first day of deals this year) as well as the nights of Black Friday and Cyber Monday, there’ll be emergency MS Teams calls taking place at midnight. Developers across all areas of the shop will be there, poised to respond straight away if anything goes wrong.

My little investigation has answered a lot of my newbie questions. I’m now awaiting Black Friday week with feverish anticipation. Do you feel the same way? Or is there anything else you’d like to find out from me? I won’t reveal the details of any special offers, but I’m happy to answer technical questions in the comments.

105 people like this article


User Avatar
User Avatar

When I'm not pushing pixels around or organizing bytes, I can often be found with my self-built FPV drones. From disassembled action cams, to big cinema cameras, pretty much everything that takes videos or photos flies through the air with me.


Tech
Follow topics and stay updated on your areas of interest

These articles might also interest you

  • Behind the scenes

    Black Friday Week 2024: all updates from Digitec Galaxus

    by Jana Pense

  • Behind the scenes

    Team BlackJack: The rock in the surf of Black Friday

    by Dominik Bärlocher

  • Behind the scenes

    From Lego to iPhones, here’s what our customers search for most

    by Manuel Wenk

39 comments

Avatar
later