---
title: "System Design Basics Scalability and Load Balancing"
author: "Rantideb Howlader"
date: "2026-05-27T00:00:00.000Z"
canonical_url: "https://www.ranti.dev/blog/scalability-load-balancing-system-design"
license: "CC-BY-4.0"
---


Your app is live. It works. Your friend tests it your mom tests it your cat walks on the keyboard and somehow tests it. Everything is fine.

Then one day a post goes viral. Or a news site picks you up. Or your marketing team does their job.

Suddenly 100000 people are trying to use your app at the same time. Your single server starts sweating. Response times go from 200ms to 15 seconds. Then the server crashes. Your app is dead. Your users are gone.

This happens every single day to startups and established companies.

The fix involves two concepts Scalability and Load Balancing.

This guide covers both from scratch. No jargon. No assumptions. If you know what a server is you can follow this.

## Scalability Explained

Scalability is the ability of a system to handle more work by adding more resources.

Today your system handles 1 lakh users. It works fine. Tomorrow 10 lakh users show up. You need more resources like more CPU more RAM more servers or more network bandwidth.

If your system handles that growth smoothly without breaking it is scalable.

If it crashes slows down or needs you to rewrite everything from scratch it is not scalable.

Think of it like a restaurant. A small restaurant with 10 tables can serve 40 people at lunch. But what happens when 400 people show up? You either need a bigger restaurant with bigger tables and a bigger kitchen or you open more branches across the city.

That is exactly what scalability is in system design.

## Two Ways to Scale

There are two primary ways to scale a system.

### Vertical Scaling

Vertical scaling means making your existing server more powerful.

You take the same machine and give it more RAM more CPU cores or faster storage.

It is like upgrading your laptop. Same laptop but with better specs.

```mermaid
graph LR
    A[Users] --> B[Server 8GB RAM 4 CPU]
    B -->|Scale Up| C[Server 16GB RAM 8 CPU]
    class B mm-yellow
    class C mm-green
```

Advantages

- Easy to implement. Just upgrade the hardware.
- Simple to manage. One server and one place to look.
- No need to change your application code.
- Good for small to medium scale.

Limitations

- There is a hardware limit. You cannot add infinite RAM to one machine.
- Gets expensive fast. A server with 1TB RAM costs more than 10 servers with 100GB each.
- Single point of failure. If that one powerful server goes down everything goes down.

### Horizontal Scaling

Horizontal scaling means adding more servers and splitting the work between them.

Instead of one powerful machine you use many regular machines working together. A load balancer sits in front and distributes the traffic.

```mermaid
graph LR
    A[Users] --> LB[Load Balancer]
    LB --> S1[Server 1]
    LB --> S2[Server 2]
    LB --> S3[Server 3]
    class LB mm-green
    class S1 mm-blue
    class S2 mm-blue
    class S3 mm-blue
```

Advantages

- Can handle very large scale. Need more capacity? Add another server.
- Fault tolerant. If Server 1 goes down Server 2 and 3 keep working.
- Cost effective. You can use regular cheap servers.
- Easy to add or remove servers based on traffic.

Limitations

- More complex to set up and manage.
- You need a load balancer to distribute traffic.
- Data consistency and session management become harder.
- Network issues between servers can cause problems.

The real world uses both. You scale up to a reasonable level then scale out when you hit the ceiling.

## Monolithic vs Distributed Systems

How you scale depends on how your application is built.

### Monolithic Systems

A monolithic application is one big block. Everything from the frontend to the backend and database logic runs as one single program on one server.

- Vertical Scaling is possible but limited. You can upgrade the server but eventually you hit the hardware ceiling.
- Horizontal Scaling is difficult. You cannot easily split a monolith across multiple servers. Session data shared state and tight coupling make it painful.

### Distributed Systems

A distributed system breaks the application into small independent services. Each service does one thing and runs on its own.

- Vertical Scaling is possible for individual components but usually not the preferred approach.
- Horizontal Scaling is easy and recommended. Each service scales independently. The payment service can have 2 servers while the search service has 20 servers. A load balancer distributes requests to each.

If you are building something new and you know it needs to scale design it as a distributed system from the start. It is harder initially but saves you from rewriting everything later.

## Load Balancing Explained

Load balancing is the process of distributing incoming network traffic across multiple servers so no single server gets overloaded.

Imagine a mall with 10 billing counters. If all customers go to Counter 1 the line gets insanely long while Counters 2 to 10 sit empty. A traffic manager directs customers to different counters so the load is spread evenly.

That is exactly what a load balancer does with network requests.

```mermaid
graph TD
    Users[Users] --> LB[Load Balancer]
    LB --> S1[Server 1]
    LB --> S2[Server 2]
    LB --> S3[Server 3]

    class LB mm-red
    style S1 fill:#4ecdc4,stroke:#333,color:#fff
    style S2 fill:#4ecdc4,stroke:#333,color:#fff
    style S3 fill:#4ecdc4,stroke:#333,color:#fff
```

## What Does a Load Balancer Actually Do

A load balancer has five core responsibilities.

### 1 Traffic Distribution

It splits incoming requests across multiple servers so no single server is overwhelmed.

### 2 Health Checks

It regularly pings each server to check if it is alive or dead. If a server is down the load balancer stops sending traffic to it.

```mermaid
graph LR
    LB[Load Balancer] -->|Health Check OK| S1[Server 1 Up]
    LB -->|Health Check OK| S2[Server 2 Up]
    LB -->|Health Check Fail| S3[Server 3 Down]

    class S3 mm-red
    class S1 mm-green
    class S2 mm-green
```

### 3 High Availability

If one server dies the load balancer reroutes traffic to the healthy servers. Users never notice.

### 4 Scalability

When traffic increases you add new servers. The load balancer automatically includes them in the rotation.

### 5 Security

The load balancer can filter out malicious traffic and protect against attacks like DDoS. Your servers never talk to the public internet directly.

### 6 SSL Termination

The load balancer handles SSL and TLS encryption so the backend servers do not have to waste CPU power decrypting traffic.

## Advantages of Using a Load Balancer

1 Optimisation Better resource utilization and lower response times.
2 Better User Experience Less latency and smooth error free requests.
3 Prevents Downtime Detects failed servers and reroutes traffic automatically.
4 Flexibility Allows maintenance or upgrades without disrupting the user experience.
5 Scalability Easily handle traffic increases by adding physical or virtual servers.
6 Redundancy Provides built in redundancy to the system architecture.

## The Single Point of Failure Problem

If the load balancer distributes all traffic what happens if the load balancer itself goes down?

Everything breaks. The load balancer becomes a single point of failure.

The solution is to give the load balancer its own backup.

```mermaid
graph TD
    Users[Users] --> Primary[Primary LB Active]
    Users -.-> Secondary[Secondary LB Standby]
    Primary --> S1[Server 1]
    Primary --> S2[Server 2]
    Primary --> S3[Server 3]

    Primary -->|Fails| Secondary
    Secondary -.-> S1
    Secondary -.-> S2
    Secondary -.-> S3

    class Primary mm-green
    class Secondary mm-yellow
```

If the primary load balancer fails the secondary one automatically takes over. Every serious production system uses load balancer redundancy.

## Load Balancing Algorithms

This is how the load balancer decides which server gets the next request.

There are six main algorithms. The first four follow predefined rules. The last two make decisions based on real time data.

### Static Algorithms

#### 1 Round Robin

The simplest algorithm. Requests go to each server in rotation one after another.

Request 1 goes to Server 1
Request 2 goes to Server 2
Request 3 goes to Server 3
Request 4 goes back to Server 1

```mermaid
graph LR
    R1[Req 1] --> S1[S1]
    R2[Req 2] --> S2[S2]
    R3[Req 3] --> S3[S3]
    R4[Req 4] --> S1
    R5[Req 5] --> S2
    R6[Req 6] --> S3

    class S1 mm-blue
    class S2 mm-blue
    class S3 mm-blue
```

Use this when all servers have the same capacity and power.

The problem is it assumes all servers are equally powerful. If Server 1 has 16GB RAM and Server 3 has 4GB RAM Round Robin still sends the same number of requests to both.

#### 2 Weighted Round Robin

Same as Round Robin but each server gets a weight based on its capacity. A more powerful server gets more requests.

Server 1 has weight 5
Server 2 has weight 2
Server 3 has weight 1

For every 8 requests Server 1 gets 5 Server 2 gets 2 and Server 3 gets 1.

Use this when your servers have different capacities.

#### 3 IP Hash

The load balancer takes the client IP address runs it through a hash function and maps it to a specific server. The same IP always goes to the same server.

Client 192.168.1.1 goes to Server 1
Client 192.168.1.2 goes to Server 2
Client 192.168.1.3 goes to Server 1

Use this when you need session persistence where the user must always land on the same server.

#### 4 Source IP Hash

Similar to IP Hash but it uses both the source IP and the destination IP to calculate the hash. This gives a more uniform distribution when you have multiple destination services.

Use this when you have multiple backend services and want better balance across all of them.

### Dynamic Algorithms

These algorithms make decisions based on real time server metrics.

#### 5 Least Connection

The load balancer sends the next request to the server that currently has the fewest active connections.

Server 1 has 120 connections
Server 2 has 20 connections
Server 3 has 80 connections

The next request goes to Server 2.

Use this for long lived connections like database connections or file uploads where some requests take much longer than others.

#### 6 Least Response Time

The load balancer sends the next request to the server with the fastest average response time.

Server 1 response is 300ms
Server 2 response is 80ms
Server 3 response is 120ms

The next request goes to Server 2.

Use this for performance critical applications like real time APIs or gaming servers where every millisecond matters.

### Algorithm Rules to Remember

Static algorithms use predefined rules. Dynamic algorithms use real time metrics. The ultimate goal is efficient distribution high availability and best performance.

## Redundancy and Reliability

Redundancy is duplicating nodes or components so that when one fails the duplicate takes over. The system keeps running and users notice nothing.

Keep a backup of everything. If the main one breaks the backup starts working.

Redundancy increases reliability but it costs extra money. It is worth it for critical systems.

There are two types of redundancy.

### Active Redundancy

All servers are on and handling requests at the same time. If one dies the others keep going. There is no downtime or switchover delay.

```mermaid
graph LR
    Users[Users] --> LB[Load Balancer]
    LB --> S1[Server 1 Active]
    LB --> S2[Server 2 Active]
    LB --> S3[Server 3 Active]

    class S1 mm-green
    class S2 mm-green
    class S3 mm-green
```

Google and Facebook use this. All their servers are active all the time. It provides high availability but costs more because all servers consume resources.

### Passive Redundancy

One server is active and handles all requests. The other server is on standby. It does nothing until the active server fails. Then it takes over.

```mermaid
graph LR
    Users[Users] --> AS[Active Server]
    AS -->|Fails| PS[Passive Server Standby]

    class AS mm-green
    class PS mm-yellow
```

A hot standby database server sits idle replicating data from the primary. The moment the primary dies it becomes the new primary. It is cheaper but has a brief switchover delay.

## Replication Redundancy plus Synchronization

Replication is redundancy plus keeping the copies in sync.

With plain redundancy you have a backup but it might be outdated.

With replication the backup is constantly updated with the latest data from the primary. So when you switch over you do not lose data.

```mermaid
graph LR
    Client[Client Write] --> Primary[Primary DB]
    Primary -->|replicate| R1[Replica DB 1]
    Primary -->|replicate| R2[Replica DB 2]
    R1 -->|Read| Reader[Read Queries]
    R2 -->|Read| Reader

    class Primary mm-red
    style R1 fill:#4ecdc4,stroke:#333,color:#fff
    style R2 fill:#4ecdc4,stroke:#333,color:#fff
```

### Synchronous Replication

The primary waits until all replicas confirm they received the data before telling the client the write was successful.

It gives strong consistency since every replica has the latest data. But it is slower because the client waits for all replicas.

Use this when data accuracy is critical like in banking and medical records.

### Asynchronous Replication

The primary writes locally and immediately tells the client the write was successful. The replicas get updated in the background slightly later.

It is fast because the client does not wait. But replicas might have slightly outdated data for a brief moment.

Use this when speed matters more than perfect consistency like in social media feeds and analytics.

### Benefits of Replication

- High Availability If the primary goes down a replica takes over.
- Fault Tolerance Data survives even if a server dies.
- Load Distribution Read queries can go to replicas reducing load on the primary.
- Disaster Recovery Replicas in different regions protect against regional outages.
- More safety The more replicas you have the safer your data is.

## Putting It All Together

Let us see how scalability load balancing redundancy and replication work together in a real system.

```mermaid
graph TD
    Users[Users] --> DNS[DNS]
    DNS --> LB1[Primary Load Balancer]
    DNS -.-> LB2[Secondary Load Balancer Standby]

    LB1 --> App1[App Server 1]
    LB1 --> App2[App Server 2]
    LB1 --> App3[App Server 3]

    App1 --> PDB[Primary Database]
    App2 --> PDB
    App3 --> RDB[Read Replica]

    PDB -->|Replication| RDB
    PDB -->|Replication| RDB2[Read Replica 2 Different Region]

    class LB1 mm-red
    class LB2 mm-yellow
    class PDB mm-purple
    class RDB mm-teal
    class RDB2 mm-teal
```

DNS resolves the domain to the load balancer IP.
The load balancer distributes traffic across app servers which is horizontal scaling.
App servers are horizontally scaled.
The primary database handles all writes.
Read replicas handle read queries for load distribution.
A cross region replica provides disaster recovery.
A secondary load balancer is on standby in case the primary fails.

This is the foundation of every scalable system on the internet.

## Core Concepts

Here is everything in one place

| Concept                  | What It Means                              | Key Point                          |
| ------------------------ | ------------------------------------------ | ---------------------------------- |
| **Scalability**          | System can handle future load growth       | Plan for 10x your current traffic  |
| **Vertical Scaling**     | Upgrade one server like more RAM or CPU    | Easy but limited                   |
| **Horizontal Scaling**   | Add more servers                           | Best for distributed systems       |
| **Load Balancer**        | Distributes traffic across servers         | The traffic manager of your system |
| **Round Robin**          | Requests rotate through servers equally    | Simple and works for equal servers |
| **Weighted Round Robin** | More powerful servers get more traffic     | For mixed capacity setups          |
| **IP Hash**              | Same user always hits the same server      | Session persistence                |
| **Least Connection**     | Send to server with fewest connections     | Best for long lived connections    |
| **Least Response Time**  | Send to fastest server                     | Best for latency sensitive apps    |
| **Active Redundancy**    | All servers active at once                 | Zero downtime but higher cost      |
| **Passive Redundancy**   | Backup server waits on standby             | Lower cost but brief switchover    |
| **Sync Replication**     | Wait for all copies before confirming      | Strong consistency but slower      |
| **Async Replication**    | Confirm immediately and copy in background | Fast with eventual consistency     |

System design is just common sense applied at scale.


---

<!-- METADATA_START -->
## Metadata & Citations

### Further Reading
- [System Design Basics Caching How to Make Your App Fast](https://www.ranti.dev/blog/caching-system-design-guide.md)
- [The Perfect Pipeline: How to Ship Code Without Crashing Production](https://www.ranti.dev/blog/perfect-pipeline-blue-green.md)
- [Git Under the Hood: The DevOps Engineer's 'Undo' Button](https://www.ranti.dev/blog/git-internals-undo-button.md)

### Navigation
- [Back to Bio Hub](https://www.ranti.dev/.md)
- [Full Site Manifest](https://www.ranti.dev/llms.txt)

```json
{
  "@context": "https://schema.org",
  "@type": "TechArticle",
  "headline": "System Design Basics Scalability and Load Balancing",
  "author": {
    "@type": "Person",
    "name": "Rantideb Howlader"
  },
  "datePublished": "2026-05-27T00:00:00.000Z",
  "url": "https://www.ranti.dev/blog/scalability-load-balancing-system-design",
  "license": "https://creativecommons.org/licenses/by/4.0/",
  "isAccessibleForFree": true
}
```

### BibTeX
```bibtex
@article{scalability-load-balancing-system-design_2026,
  author = {Rantideb Howlader},
  title = {System Design Basics Scalability and Load Balancing},
  journal = {Rantideb Howlader Portfolio},
  year = {2026},
  url = {https://www.ranti.dev/blog/scalability-load-balancing-system-design},
  note = {Accessed: 2026-05-31}
}
```

### IEEE
Rantideb Howlader, "System Design Basics Scalability and Load Balancing," Rantideb Howlader Portfolio, 2026. [Online]. Available: https://www.ranti.dev/blog/scalability-load-balancing-system-design. [Accessed: 2026-05-31].

### APA
Rantideb Howlader. (2026). System Design Basics Scalability and Load Balancing. Rantideb Howlader. Retrieved from https://www.ranti.dev/blog/scalability-load-balancing-system-design

--- 
*This content is provided in research-grade Markdown format. Required Attribution: Cite as Rantideb Howlader (2026).*
<!-- METADATA_END -->