How Do Large-Scale Systems Deal With Millions of Users & Terabytes of Data?
In today’s world, modern Software-as-a-Service (SaaS) solutions are usually designed to serve massive global audiences with tens of millions of daily users. But how do these systems handle huge numbers of transactions in nearly real-time? And how can they process such massive volumes of data, where they store them, and how can they efficiently retrieve them?
In this post, we’ll explore some of the key technologies and architectures that, behind the scenes, power these systems to seamlessly provide this amazing level of performance.
Before starting, it’s vital to cover some very fundamental concepts which are used all over the architecture of such systems.
Clustering and Load Balancing
Clusters composed of tens, hundreds or thousands of nodes aim to provide high availability on (sub)systems and microservices, in terms of fault tolerance (if a server becomes unresponsive or faces a hardware failure) and load balancing (where incoming traffic is directed to separate nodes, based on the load balancer’s distribution algorithm).
Containerization and Orchestration
In modern deployments, it’s common to bundle application code along with all its dependencies, libraries, configurations, environment variables, etc. into single packages called containers. Containers share the OS within an isolated process with its own filesystem and resources. Sharing the same (host’s) OS makes them more performant and lightweight than virtual machines (VMs). But one more great benefit is that they allow seamless deployments, which we’ll see later on.
Orchestration, on the other hand, means managing the execution and lifecycle of the containerized workloads and services, monitoring their load and availability, and taking actions automatically when they need to in order to preserve the system’s availability and performance at all times.
Either through horizontal scaling (spawning/terminating server instances or containers) or vertical (adding/reducing resources on servers), the underlying components of such systems are orchestrated to scale automatically based on the demand and maintain their availability and health status under traffic spikes. Horizontal scaling is usually preferred as it makes auto-scaling seamless and does not require restarting any service. Nodes are automatically added to the cluster as long as certain criteria are met (e.g. servers’ load percentage during the last minutes).
Of course, all the orchestration of auto-scaling clusters can hardly exist outside of capable computing infrastructures. Cloud computing platforms, such as Amazon Web Services (AWS), Google Cloud Platform (GCP), and Microsoft Azure, offer scalable computing capacities that are programmable via APIs, command-line tools, and SDKs.
Being aware of the fundamental technologies and concepts that are used to auto-scale system components, we can now scratch a little bit more of the surface of designing global-scale SaaS platforms.
First of all, writing efficient APIs is more than critical. APIs are the public-facing gateways that communicate with users (via web and mobile apps). As API calls can reach billions, either per day or in less than a few hours, the necessity to write APIs that perform and respond super-fast becomes obvious. In each API call, every single byte, both in terms of network transmission, server memory and storage, accumulates to GBs per hour and can impact the overall performance of the platform, not to mention the costs in cloud infrastructure. So, what are the key points to writing super-fast APIs?
Fast, fine-tuned algorithms: It’s pointless to provision everything else and leave algorithms performing computations in costly ways; costly both in terms of CPU utilization and memory resources. Each API call shall perform the required business logic by executing the least possible computations and commands, and by reserving the least possible memory.
Avoiding redundant calls by designing the API architecture with focus on efficiency. Depending on the SaaS nature and requirements, this might also involve webhooks or websockets (and other communication technologies), e.g. to make a client/server communication only when something has emerged (webhooks) or in real-time (websockets), persisting the connection when it’s too frequent, to avoid repeated communication initializations, authentications etc.
Lightweight communications: As mentioned earlier, every byte in such systems counts a lot. So each API call shall be designed to transmit the least possible data between clients and API services. That data can additionally be compressed, though by adding a (bearable) extra processing time to the server for the compression and decompression. gRPC is an excellent communication framework that combines the sophistication of Remote Procedure Calls with high-speed binary compression and transmission, bi-directional streaming and other capabilities, all under HTTP/2.
Using fast languages and frameworks: There are many reasons to use compiler-based instead of interpreter-based languages to build massive SaaS platforms. The performance differences between the two, as seen in most of the language/framework comparison benchmarks, leaves no space for otherwise. For the sake of building super-performant APIs, software engineers do also have to make some more sacrifices. Using developer-first toolkits, such as ORMs (they ease mapping between table records and objects), are a no-go. In such systems you need to optimize the smallest detail and take advantage of every possible feature the underlying services (like the RDBMS) can give you, and not to abstract things away from your control to ease development.
Caching: As it’s good to avoid redundant calls, it’s also good to avoid repeated computations. If a computation (or fetching data) takes a non-negligible time to complete, it’s better to cache the result and upon the next API calls, serve directly the cached output back to the client, which can be tremendously faster. Caching in such systems is of course scalable too, meaning there’s a cluster that takes care of the horizontal scaling of the caching service nodes.
Asynchronicity: One more major design decision in highly performant APIs is executing business logic in an as non-blocking manner as possible, to avoid reserving system resources for more milliseconds than those really needed, and even more importantly, to make scaling easier. The best approach to design asynchronous architectures is via message queuing or event streaming, which is being discussed later on.
Databases & Big Data
Two main questions that can arise to someone interested in understanding the internals of such systems are: how these enormous volumes of data are stored and how fast they can be retrieved, especially when dealing with millions of concurrent client connections that come from all around the globe.
The answer to both questions is database sharding. Sharding is a method where a database is horizontally partitioned, meaning divided into several subsets (called shards), that are distributed over multiple instances or cluster nodes. The key point here is the logic that is used to decide where each data will be distributed over the dozens of the available shards. This key point is in fact called a “shard key”. The shard key of a table (which is actually a column or a combination of columns inside the table, just like the primary key) will define where the data will be stored to or where shall be fetched from. Combining geographical information (e.g. continent code or country code) in the shard key will easily distribute data across servers that reside close to that geographical region, hence greatly improving latency.
But the benefits of database sharding are much greater than just network latency. Different users will fetch data that resides in different shards (servers or cluster nodes); hence, it leverages parallel processing and achieves:
- Fast response times
- High I/O throughputs
- High storage capacities
- High availability
For many scenarios, database sharding is still not enough to deal with specific cases, especially when it comes to OLAP, real-time analytics, data science, and machine learning. For these cases, big data solutions (which leverage distributed processing and storage by design), are efficient enough for building data pipelines that transform, aggregate, and analyze the input data, even in nearly real-time and even from live data streaming sources.
Message Queues & Event Streaming
We previously mentioned that it is necessary for APIs to execute things asynchronously, at the extent that is possible (and applicable). For example, in a social media messaging app., when a user chats with another user, the sender doesn’t care if their message was saved or processed successfully on the server side. That’s only a technical detail. But they’ll care about when their message was received and when it was read by the recipient. So, why block a server thread for more milliseconds during an API call, by processing synchronously the call’s business logic, when it’s not critical to respond to the client app accordingly? In such cases, it’s fine to respond rapidly to the API call with just an acknowledgment that the call was successful, implying that the message was successfully queued for processing.
For all these cases where interaction with the server APIs is asynchronous by nature, asynchronous execution can help avoid bottlenecks on servers’ resources (as the OS has specific soft and hard limits like the max number of open file handles per process hence pipes, streams and open sockets too, on Linux systems at least), and allow continuing the communication between the client app and the server when there’s an action or update on the server side that requires it. But more importantly, to leverage parallel execution by other hosts that are dedicated to executing specific jobs.
A common way to design this kind of asynchronous architecture is through message queues or by streaming events. In these subsystems, one or more producers (like the several endpoints of the server APIs) generate and enqueue messages or stream new events, such as “send a welcome email to that user” once a new user signed up to the platform. Consumer job workers that run indefinitely, wait for those events (tasks) to come, and once they receive an incoming event, they will process it, and may (or may not, if they don’t need to) acknowledge its completion in a defined way.
Event streaming is similar to message queuing, with the obvious difference that they utilize streaming as the communication medium for both receiving events from the producers and sending events to the consumers.
In our case, the use of event streaming and message queues is mandatory to allow horizontal scaling (there can be tens or thousands of nodes processing specific types of events in parallel). But also, building consumer workers brings the opportunity to decouple and isolate independent components of the platform. The benefit is to avoid designing a monolithic architecture, and create independently scalable (and deployable) microservices.
DevOps & Automations
The increased complexity levels of these systems don’t stop at the implementation phase. It’s needless to mention that everything related to them, even after launch, is the same complex. From monitoring the usage and availability of a dozen APIs and microservices that comprise the platform and which are spread over hundreds or thousands of nodes, to deploying updates to live, is far from being a trivial task.
To begin with, for logging and monitoring, these systems use log aggregation systems, which are also horizontally scalable and highly available. The aggregators are designed for petabyte scale and high throughput, while they provide the means to process all the aggregated logs that come from multiple sources simultaneously, create alerts, and build metrics in real-time. The powerful querying engines that they bundle help dive into petabytes of log records to seek out specific entries or visualize log data of specific time ranges.
Data analytics is another point that needs special attention, as at this scale, SaaS platforms leverage no less than real-time data analytics systems that are optimized for high concurrency and low latency. These solutions outperform the traditional reporting and BI systems and can generate real-time dashboards and metrics at scale, analyze data in batch or real-time, trigger alerts, and perform other user-defined actions.
But one of the most challenging parts of DevOps is the implementation of a continuous integration, delivery, and deployment mechanism (CI/CD) to automate the reliable deployment of updates and fixes to the production environment.
The CI/CD systems usually start by monitoring the engineers’ commits at specific branches in the source code repository, and trigger the automated execution of a series of unit, integration and UX tests (the more tests, usually several thousands, the less likely to push a bug into live) that are defined in the CI/CD pipeline. If all tests pass, then the solution moves from the CI state to the CD state. The CD state has in fact a dual meaning. In the CI/CD pipeline, it firstly denotes the Delivery step, which is building a validated production release ready for launch and adding it to the code repository. It’s then the turn of the Deployment step, which automates the rolling of the just-generated production-ready build to the live environment.
Of course, automating the deployment is not trivial. However, thanks to the containerized architecture of these systems, it is at least easier to automate the packaging of the updated application components into new containers and roll them out through the orchestration platform.
In this post, we just uncovered the tip of the iceberg, giving a rough overview of some of the most critical points these systems take very seriously, such as: scaling, data storage & processing, API performance, event streaming/queueing, orchestration, monitoring, and automated deployment.
It’s true that building systems to serve massive audiences requires a team of skilled software engineers with many years of experience, a generous development timeframe and a budget that will easily climb between six to seven digit amounts in dollars. However, if a SaaS solution needs to be able to reach that scale, it likely means that it is worth the investment.
Nowadays, there are so many solutions for every piece of a SaaS system, and each solution is designed to solve a specific range of problems. However, every SaaS has its own requirements and someone needs to know in depth these technologies and their differences to be able to see which are the truly right ones for their project. It would cost a lot in budget and time-to-market selecting a technology that will later turn out not to be the best fit.
Our team has the expertise to advise on such decisions and help at keeping projects’ development on the right track.
We also develop global-scale SaaS solutions on behalf of start-ups and enterprises, and reliably deliver turn-key systems that are ready for launch. By handing over a well designed, well tested and well documented system to your own engineering team that’s ready to scale, and then nurturing the project with our continuous advice and support, we not only assist in reducing the development budgets, but we also take out all of the risks you would otherwise have.
SaaSAPIsAuto-ScalingBig DataData AnalyticsContainerizationOrchestrationEvent StreamingDevOpsCI/CD