---
title: "System Design Basics Caching How to Make Your App Fast"
author: "Rantideb Howlader"
date: "2026-05-28T00:00:00.000Z"
canonical_url: "https://www.ranti.dev/blog/caching-system-design-guide"
license: "CC-BY-4.0"
---


Your database is slow. Not because it is broken. Not because you wrote bad queries. It is slow because you are asking it the same questions a million times a day.

Every time a user opens your app your server goes to the database. It fetches the same profile data. The same product list. The same feed. Over and over and over.

What if you could store those answers somewhere faster? Somewhere that does not need to search through millions of rows every single time?

That is caching.

If you read our guide on [Scalability and Load Balancing](/blog/scalability-load-balancing-system-design) you learned how to handle more users by adding more servers. Caching is the next step. It makes each server faster so you need fewer servers in the first place.

## What is Caching

Caching is storing frequently used data in a fast temporary storage so future requests get the data quickly without hitting the database.

Think of it like this. You are studying for an exam. Your textbook is in the bookshelf across the room. Every time you need a formula you walk to the bookshelf find the page read the formula and walk back.

That is slow.

So you write the most used formulas on a sticky note and put it on your desk. Now when you need a formula you just look at the sticky note. No walking. No searching. Instant.

The sticky note is your cache. The bookshelf is your database.

```mermaid
graph LR
    A[Your Desk] -->|Fast| B[Sticky Note Cache]
    A -->|Slow| C[Bookshelf Database]
    class B mm-green
    class C mm-yellow
```

The cache sits between your application and the database. It stores copies of data that your app needs often. When a request comes in the app checks the cache first. If the data is there it returns it immediately. If not it goes to the database.

## Why Do We Need Caching

Let us take a real example. Think about Instagram.

When you open someone's profile the app needs to show their name, profile picture, bio, post count, follower count and following count.

Without caching every single time someone views that profile the app goes to the database. If that person has 10 million followers and 1 million people check their profile in one hour that is 1 million database queries for the exact same data.

The profile did not change. The data is the same. But the database is doing the same work 1 million times.

With caching the app stores the profile data in a fast cache after the first database query. The next 999999 requests get the data from the cache. The database is free to handle other work.

```mermaid
graph TD
    subgraph Without Caching
        U1[User 1] --> DB1[Database]
        U2[User 2] --> DB1
        U3[User 3] --> DB1
        U4[1M Users] --> DB1
    end

    class DB1 mm-red
```

```mermaid
graph TD
    subgraph With Caching
        U5[User 1] --> C1[Cache]
        U6[User 2] --> C1
        U7[User 3] --> C1
        U8[1M Users] --> C1
        C1 -.->|Only 1st time| DB2[Database]
    end

    class C1 mm-green
    class DB2 mm-blue
```

This is why every large scale system uses caching. [Netflix](https://netflixtechblog.com/caching-for-a-global-netflix-7bcc457012f1), Google, Amazon, Facebook. They all rely on caching to handle billions of requests.

## How Caching Works

The caching process has a simple flow.

1. A user makes a request.
2. The app checks the cache.
3. If the data is in the cache that is called a **Cache Hit**. Return the data immediately.
4. If the data is not in the cache that is called a **Cache Miss**. Go to the database, get the data, store it in the cache for next time, then return it.

```mermaid
graph TD
    A[User Request] --> B{Check Cache}
    B -->|Hit| C[Return Data from Cache]
    B -->|Miss| D[Fetch from Database]
    D --> E[Store in Cache]
    E --> F[Return Data to User]

    class B mm-yellow
    class C mm-green
    class D mm-blue
    class E mm-blue
```

**Cache Hit** means the data was found in the cache. This is what we want. It is fast.

**Cache Miss** means the data was not in the cache. The app has to go to the database. This is slow but the data gets cached for next time.

**Cache Hit Ratio** is the percentage of requests served from the cache. A good cache hit ratio is above 90%. If your hit ratio is 95% that means 95 out of every 100 requests never touch the database.

## What is TTL

TTL stands for Time To Live. It tells the cache how long to keep the data before throwing it away.

You set a TTL when you store data in the cache. After that time passes the data is automatically deleted. The next request will go to the database and get fresh data.

Why do we need TTL? Because data changes. If someone updates their Instagram bio but the cache keeps the old bio forever users will see outdated information.

TTL examples

- User profile data: TTL of 5 minutes
- Product prices: TTL of 1 minute
- Static page content: TTL of 1 hour
- Stock prices: TTL of 5 seconds

The right TTL depends on how often the data changes and how critical it is to show the latest version.

## Types of Caching

There are two main types of cache based on where the cache lives.

### In Memory Cache (Local Cache)

The cache lives in the same server as your application. It is stored in the server RAM.

This is the fastest type of cache because there is no network call. The data is right there in the same process memory.

But it has a big limitation. The cache is not shared. If you have 3 app servers each server has its own separate cache. Server 1 might have user data cached but Server 2 does not. So Server 2 still hits the database.

```mermaid
graph TD
    subgraph Server 1
        A1[App] --> LC1[Local Cache]
    end
    subgraph Server 2
        A2[App] --> LC2[Local Cache]
    end
    subgraph Server 3
        A3[App] --> LC3[Local Cache]
    end
    LC1 -.-> DB[Database]
    LC2 -.-> DB
    LC3 -.-> DB

    class LC1 mm-green
    class LC2 mm-green
    class LC3 mm-green
    class DB mm-blue
```

Examples of in memory cache

- HashMap in Java
- LRU Cache
- MemoryCache in Android
- Caffeine Cache in Java
- Python dict used as cache

Use this when your app runs on a single server or when the cached data is specific to each server.

### Distributed Cache

The cache lives on a separate dedicated server. All your app servers connect to this shared cache over the network.

This means Server 1, Server 2 and Server 3 all share the same cache. When Server 1 caches user data Server 2 can read it too.

```mermaid
graph TD
    A1[App Server 1] --> RC[Distributed Cache Server]
    A2[App Server 2] --> RC
    A3[App Server 3] --> RC
    RC -.-> DB[Database]

    class RC mm-red
    class A1 mm-blue
    class A2 mm-blue
    class A3 mm-blue
    class DB mm-yellow
```

Examples of distributed cache

- [Redis](https://redis.io/) the most popular choice
- [Memcached](https://memcached.org/) simple and fast
- [Hazelcast](https://hazelcast.com/) for Java applications
- Amazon ElastiCache (managed Redis or Memcached on AWS)

Advantages of distributed cache

- Shared across all servers. One cache for everyone.
- Scalable. You can add more cache servers as traffic grows.
- Survives server restarts. If App Server 1 crashes the cache is still there.
- Reliable. Most distributed caches support replication so even the cache has backups.

Use this when your app has multiple servers and you need a shared cache. This is what most production systems use.

If you are running your app on AWS you can use [Amazon ElastiCache](https://aws.amazon.com/elasticache/) which gives you managed Redis or Memcached. If you are interested in AWS fundamentals check out our [AWS Zero to Hero guide](/blog/aws-zero-to-hero-journey).

## Redis vs Memcached

These are the two most popular distributed cache solutions. Here is when to use each.

| Feature           | Redis                                        | Memcached                   |
| ----------------- | -------------------------------------------- | --------------------------- |
| Data types        | Strings lists sets sorted sets hashes        | Strings only                |
| Persistence       | Yes. Data survives restarts                  | No. Data is lost on restart |
| Replication       | Yes. Primary and replica support             | No built in replication     |
| Threading         | Single threaded                              | Multi threaded              |
| Memory management | Advanced with eviction policies              | Simple slab allocation      |
| Use case          | Complex caching sessions leaderboards queues | Simple key value caching    |

Use Redis when you need more than simple key value storage. Use Memcached when you need raw speed for simple data and nothing else.

## Cache Invalidation

This is the hardest part of caching. There is a famous saying in computer science.

There are only two hard things in computer science. Cache invalidation and naming things.

Cache invalidation means removing or updating old data from the cache when the original data changes.

If a user updates their profile picture but the cache still has the old picture every visitor sees the old one. That is stale data. Cache invalidation prevents this.

### When to Invalidate

- When data is updated like a new post or changed bio
- When follower or following count changes
- When user information is changed
- When TTL expires
- When you manually trigger an invalidation

```mermaid
graph LR
    A[Data Changes in DB] --> B[Invalidate Cache]
    B --> C[Next Request Fetches Fresh Data]
    C --> D[Store New Data in Cache]

    class A mm-red
    class B mm-yellow
    class C mm-blue
    class D mm-green
```

## Caching Strategies

There are four main strategies for how your app reads and writes data with a cache.

### 1 Cache Aside (Lazy Loading)

This is the most common pattern. The app manages the cache itself.

**Read flow.** The app checks the cache first. If the data is there return it. If not go to the database, get the data, store it in the cache and then return it.

**Write flow.** The app writes to the database directly. Then it deletes the cached version so the next read gets fresh data.

```mermaid
graph TD
    A[App] -->|1 Check Cache| B{Cache Hit?}
    B -->|Yes| C[Return Cached Data]
    B -->|No| D[2 Query Database]
    D --> E[3 Store in Cache]
    E --> F[4 Return Data]

    class B mm-yellow
    class C mm-green
    class D mm-blue
```

This is simple and works well. The downside is the first request for any data will always be slow because it is a cache miss.

Most web apps use this pattern. If you have built a [URL shortener on AWS](/blog/how-to-build-a-url-shortener-on-aws-in-2026) you would use cache aside for the URL lookups.

### 2 Write Through Cache

Every write goes to both the cache and the database at the same time. So the cache is always up to date.

**Read flow.** Always read from the cache. The data will always be fresh.

**Write flow.** Write to the cache first. The cache then writes to the database.

Advantage: The cache and database are always in sync. No stale data.

Disadvantage: Every write is slower because it has to write to two places. Also you might cache data that nobody ever reads.

Use this when you cannot afford stale data. Banking apps use this.

### 3 Write Back (Write Behind) Cache

The app writes to the cache only. The cache then writes to the database in the background later.

**Read flow.** Read from the cache.

**Write flow.** Write to the cache. The cache batches multiple writes and sends them to the database later.

Advantage: Very fast writes because the app only talks to the cache.

Disadvantage: If the cache crashes before syncing to the database you lose data.

Use this for high write volume systems where some data loss is acceptable. Analytics and logging systems use this.

### 4 Read Through Cache

Similar to cache aside but the cache itself manages the database reads. The app only talks to the cache and never directly to the database.

**Read flow.** The app asks the cache for data. If the cache does not have it the cache goes to the database, stores the data and returns it.

**Write flow.** Separate from the read path. You decide how writes work.

Advantage: The app code is simpler because it only talks to the cache.

Disadvantage: The cache needs to know how to query the database which adds complexity to the cache layer.

### Strategy Comparison

| Strategy      | Read Speed  | Write Speed | Data Freshness        | Complexity | Risk               |
| ------------- | ----------- | ----------- | --------------------- | ---------- | ------------------ |
| Cache Aside   | Fast on hit | Normal      | Can be stale          | Low        | Stale reads        |
| Write Through | Always fast | Slower      | Always fresh          | Medium     | Wasted cache space |
| Write Back    | Always fast | Very fast   | Always fresh in cache | High       | Data loss on crash |
| Read Through  | Fast on hit | Normal      | Can be stale          | Medium     | Cache complexity   |

## Where Caching Happens

Caching is not just one thing in one place. It happens at multiple layers of your system.

### Browser Cache

Your browser caches CSS, JavaScript, images and fonts locally. When you visit a site again the browser loads those files from disk instead of downloading them again.

You control this with HTTP headers like `Cache-Control` and `ETag`.

### CDN Cache

A CDN (Content Delivery Network) caches your static content on servers around the world. When a user in India requests your website they get the files from a server in Mumbai instead of your server in the US.

Popular CDNs include [Cloudflare](https://www.cloudflare.com/), [AWS CloudFront](https://aws.amazon.com/cloudfront/) and [Fastly](https://www.fastly.com/).

If you have set up a portfolio on AWS you have probably used CloudFront already. Check out our [S3 portfolio hosting guide](/blog/aws-s3-portfolio-hosting-guide) for an example.

### Application Cache

This is where most of the caching logic lives. Your application decides what to cache, how long to cache it and when to invalidate it.

Examples include caching user sessions, caching API responses, caching computed results and caching database queries.

### Database Cache

Most databases have their own internal cache. MySQL has the query cache. PostgreSQL has shared buffers. MongoDB has the WiredTiger cache.

These are automatic. The database caches frequently accessed data in memory so repeated queries are faster.

### Full Caching Layer Diagram

```mermaid
graph TD
    U[User] --> B[Browser Cache]
    B --> CDN[CDN Cache]
    CDN --> LB[Load Balancer]
    LB --> App[Application Cache]
    App --> DC[Distributed Cache Redis]
    DC --> DB[Database Cache]
    DB --> Disk[Database Disk]

    class B mm-blue
    class CDN mm-green
    class App mm-yellow
    class DC mm-red
    class DB mm-purple
```

Each layer catches requests before they reach the next layer. By the time a request reaches the actual database disk it has passed through 5 layers of caching. Most requests never make it past the first two layers.

For a deeper look at how load balancers work in this stack check out our [Scalability and Load Balancing guide](/blog/scalability-load-balancing-system-design).

## Real World Example: Music Streaming App

Let us put this together with a real example. Imagine you are building a music streaming app like Spotify.

When a premium user opens the app it needs to load

- User profile and subscription details
- Recently played songs
- Personalized playlists
- Saved library
- Queued songs

Without caching each of these is a separate database query. That is 5 database calls just to open the app. Multiply that by 10 million active users and your database handles 50 million queries just for the home screen.

With caching the flow changes.

**First visit.** All 5 queries go to the database. The results are stored in cache with a TTL of 5 minutes.

**Next visit within 5 minutes.** All 5 queries hit the cache. Zero database calls. The app loads in under 100ms.

**After 5 minutes.** The cache expires. The app goes to the database again, gets fresh data and caches it.

## Benefits of Caching

Here is why every production system uses caching.

- **Faster response time.** Cache is in RAM. RAM is 100x faster than disk reads.
- **Lower database load.** Fewer queries hit the database. The database can focus on writes and complex queries.
- **Saves API calls and network trips.** Fewer calls to external services and databases.
- **Handles more traffic.** With cache you can handle 10x more users on the same infrastructure.
- **Better user experience.** Faster page loads and smoother interactions.
- **Cost effective.** Fewer database connections means you can use a smaller cheaper database.

This ties directly into [cost optimization strategies for cloud infrastructure](/blog/finops-101-cost-optimization). Caching reduces the compute and database resources you need to pay for.

## Common Caching Pitfalls

Caching sounds simple but it has traps. Here are the ones that catch people.

### Cache Stampede

When a popular cache key expires all the requests hit the database at the same time. If 10000 users are requesting the same data and the cache expires all 10000 requests flood the database at once.

The fix is to use **cache locking**. Only the first request goes to the database. The rest wait until the cache is filled again.

### Cache Penetration

When requests keep asking for data that does not exist in the database. The cache will never have it so every request goes to the database.

The fix is to cache the "not found" result too. Store a null or empty value with a short TTL.

### Cache Avalanche

When a large number of cache keys expire at the same time. This causes a sudden spike of database queries.

The fix is to add random jitter to your TTL values. Instead of all keys expiring at exactly 5 minutes make them expire between 4 and 6 minutes.

### Stale Data

When the cache has old data and users see outdated information. This happens when cache invalidation is not done properly.

The fix is to use proper invalidation strategies and set reasonable TTL values. If your system handles critical data like financial transactions consider using write through caching.

## Cache Eviction Policies

Cache memory is limited. When the cache is full and new data needs to be stored something has to be removed. The eviction policy decides what gets removed.

| Policy                          | What It Does                                   | When To Use                      |
| ------------------------------- | ---------------------------------------------- | -------------------------------- |
| **LRU** (Least Recently Used)   | Removes the data that was accessed longest ago | Most common. Good default choice |
| **LFU** (Least Frequently Used) | Removes the data that is accessed least often  | When some data is always popular |
| **FIFO** (First In First Out)   | Removes the oldest data first                  | Simple and predictable           |
| **TTL Based**                   | Removes data when its timer expires            | When data has a natural expiry   |
| **Random**                      | Removes random data                            | When no pattern exists           |

LRU is the most widely used policy. Redis uses it as the default eviction policy.

## Putting It All Together

Here is a complete system architecture with caching at every layer.

```mermaid
graph TD
    Users[Users] --> CDN[CDN Cloudflare]
    CDN --> LB[Load Balancer]
    LB --> App1[App Server 1]
    LB --> App2[App Server 2]
    LB --> App3[App Server 3]

    App1 --> Redis[Redis Cache Cluster]
    App2 --> Redis
    App3 --> Redis

    Redis --> PDB[Primary Database]
    PDB -->|Replication| RDB[Read Replica]

    class CDN mm-green
    class LB mm-red
    class Redis mm-red
    class PDB mm-purple
    class RDB mm-teal
```

The flow works like this.

1. Users hit the CDN. Static files like images CSS and JS are served from the edge. Most requests stop here.
2. Dynamic requests go through the [load balancer](/blog/scalability-load-balancing-system-design) which distributes traffic across app servers.
3. App servers check Redis first. If the data is cached it returns immediately.
4. On a cache miss the app queries the primary database or read replica.
5. The result is cached in Redis for future requests.
6. [Database replication](/blog/scalability-load-balancing-system-design) keeps the read replica in sync for [disaster recovery](/blog/disaster-recovery-rto-rpo).

This is the setup most production web apps use on the internet today.

## Common Use Cases for Caching

| Use Case                 | What Gets Cached             | Example                |
| ------------------------ | ---------------------------- | ---------------------- |
| User profiles            | Name, bio, settings          | Social media apps      |
| Feed and timeline        | Posts, tweets, stories       | Instagram, Twitter     |
| Session data             | Login tokens, preferences    | Any web app            |
| Product catalog          | Prices, descriptions, images | E-commerce sites       |
| Leaderboards             | Scores, rankings             | Gaming apps            |
| Config and feature flags | App settings, A/B tests      | SaaS platforms         |
| DNS lookups              | Domain to IP mapping         | Every internet request |
| API responses            | External API results         | Payment gateways, maps |

## Core Concepts

Here is everything in one place.

| Concept                | What It Means                                 | Key Point                                     |
| ---------------------- | --------------------------------------------- | --------------------------------------------- |
| **Caching**            | Storing data in fast temporary storage        | Reduces database load and speeds up responses |
| **Cache Hit**          | Data found in cache                           | The goal. Fast response                       |
| **Cache Miss**         | Data not in cache                             | Slower. Goes to database                      |
| **TTL**                | Time To Live for cached data                  | Controls how long data stays in cache         |
| **In Memory Cache**    | Cache on the same server                      | Fastest but not shared                        |
| **Distributed Cache**  | Cache on separate servers like Redis          | Shared across all app servers                 |
| **Cache Aside**        | App manages cache reads and writes            | Most common pattern                           |
| **Write Through**      | Write to cache and database together          | Always fresh but slower writes                |
| **Write Back**         | Write to cache first and database later       | Fast writes but risk data loss                |
| **Cache Invalidation** | Removing or updating stale data               | The hardest problem in caching                |
| **Cache Stampede**     | Many requests hit database when cache expires | Fix with locking or pre warming               |
| **LRU Eviction**       | Remove least recently used data first         | Best default eviction policy                  |
| **CDN**                | Cache static content at the network edge      | Serves users from nearest server              |

Caching is not something you add at the end. It is a core part of system design that you plan from the start. Get it right and your app handles 10x the traffic on the same hardware. Get it wrong and your users see stale data while your database melts.

If you are learning system design start with [Scalability and Load Balancing](/blog/scalability-load-balancing-system-design) then come here for caching. Next up will be databases and message queues. Stay tuned.


---

<!-- METADATA_START -->
## Metadata & Citations

### Further Reading
- [System Design Basics Scalability and Load Balancing](https://www.ranti.dev/blog/scalability-load-balancing-system-design.md)
- [The Perfect Pipeline: How to Ship Code Without Crashing Production](https://www.ranti.dev/blog/perfect-pipeline-blue-green.md)
- [Git Under the Hood: The DevOps Engineer's 'Undo' Button](https://www.ranti.dev/blog/git-internals-undo-button.md)

### Navigation
- [Back to Bio Hub](https://www.ranti.dev/.md)
- [Full Site Manifest](https://www.ranti.dev/llms.txt)

```json
{
  "@context": "https://schema.org",
  "@type": "TechArticle",
  "headline": "System Design Basics Caching How to Make Your App Fast",
  "author": {
    "@type": "Person",
    "name": "Rantideb Howlader"
  },
  "datePublished": "2026-05-28T00:00:00.000Z",
  "url": "https://www.ranti.dev/blog/caching-system-design-guide",
  "license": "https://creativecommons.org/licenses/by/4.0/",
  "isAccessibleForFree": true
}
```

### BibTeX
```bibtex
@article{caching-system-design-guide_2026,
  author = {Rantideb Howlader},
  title = {System Design Basics Caching How to Make Your App Fast},
  journal = {Rantideb Howlader Portfolio},
  year = {2026},
  url = {https://www.ranti.dev/blog/caching-system-design-guide},
  note = {Accessed: 2026-05-31}
}
```

### IEEE
Rantideb Howlader, "System Design Basics Caching How to Make Your App Fast," Rantideb Howlader Portfolio, 2026. [Online]. Available: https://www.ranti.dev/blog/caching-system-design-guide. [Accessed: 2026-05-31].

### APA
Rantideb Howlader. (2026). System Design Basics Caching How to Make Your App Fast. Rantideb Howlader. Retrieved from https://www.ranti.dev/blog/caching-system-design-guide

--- 
*This content is provided in research-grade Markdown format. Required Attribution: Cite as Rantideb Howlader (2026).*
<!-- METADATA_END -->