What is caching in system design?

Caching is storing frequently used data in a fast temporary storage like RAM so that future requests for the same data are served quickly without hitting the slow database every time.

What is the difference between in memory cache and distributed cache?

In memory cache stores data in the same server memory. It is fast but not shared across servers. Distributed cache uses a separate cache server like Redis that multiple app servers can access. It is shared and survives server restarts.

What is cache invalidation?

Cache invalidation is the process of removing or updating stale data from the cache. When the original data changes in the database the cache must be updated or deleted so users do not see old information.

What is TTL in caching?

TTL stands for Time To Live. It is a timer you set on cached data. When the timer expires the cached data is automatically deleted and the next request fetches fresh data from the database.

What is the difference between write through and write back cache?

Write through writes data to both the cache and the database at the same time. It is safe but slower. Write back writes to the cache first and updates the database later in the background. It is faster but risks data loss if the cache crashes before syncing.

What is a CDN and how is it related to caching?

A CDN or Content Delivery Network is a global network of servers that caches static content like images CSS and JavaScript close to users. It is caching at the network edge so users get content from the nearest server instead of your origin server.

When should I use Redis vs Memcached?

Use Redis when you need data structures like lists sets and sorted sets plus persistence and replication. Use Memcached when you need simple key value caching with multi threaded performance and nothing else.

System Design Basics Caching How to Make Your App Fast

Rantideb Howlader•May 28, 2026 (3w ago)•19 min read•

Your database is slow. Not because it is broken. Not because you wrote bad queries. It is slow because you are asking it the same questions a million times a day.

Every time a user opens your app your server goes to the database. It fetches the same profile data. The same product list. The same feed. Over and over and over.

What if you could store those answers somewhere faster? Somewhere that does not need to search through millions of rows every single time?

That is caching.

If you read our guide on Scalability and Load Balancing you learned how to handle more users by adding more servers. Caching is the next step. It makes each server faster so you need fewer servers in the first place.

What is Caching

Caching is storing frequently used data in a fast temporary storage so future requests get the data quickly without hitting the database.

Think of it like this. You are studying for an exam. Your textbook is in the bookshelf across the room. Every time you need a formula you walk to the bookshelf find the page read the formula and walk back.

That is slow.

So you write the most used formulas on a sticky note and put it on your desk. Now when you need a formula you just look at the sticky note. No walking. No searching. Instant.

The sticky note is your cache. The bookshelf is your database.

graph LR
    A[Your Desk] -->|Fast| B[Sticky Note Cache]
    A -->|Slow| C[Bookshelf Database]
    class B mm-green
    class C mm-yellow

The cache sits between your application and the database. It stores copies of data that your app needs often. When a request comes in the app checks the cache first. If the data is there it returns it immediately. If not it goes to the database.

Why Do We Need Caching

Let us take a real example. Think about Instagram.

When you open someone's profile the app needs to show their name, profile picture, bio, post count, follower count and following count.

Without caching every single time someone views that profile the app goes to the database. If that person has 10 million followers and 1 million people check their profile in one hour that is 1 million database queries for the exact same data.

The profile did not change. The data is the same. But the database is doing the same work 1 million times.

With caching the app stores the profile data in a fast cache after the first database query. The next 999999 requests get the data from the cache. The database is free to handle other work.

graph TD
    subgraph Without Caching
        U1[User 1] --> DB1[Database]
        U2[User 2] --> DB1
        U3[User 3] --> DB1
        U4[1M Users] --> DB1
    end
 
    class DB1 mm-red

graph TD
    subgraph With Caching
        U5[User 1] --> C1[Cache]
        U6[User 2] --> C1
        U7[User 3] --> C1
        U8[1M Users] --> C1
        C1 -.->|Only 1st time| DB2[Database]
    end
 
    class C1 mm-green
    class DB2 mm-blue

This is why every large scale system uses caching. Netflix, Google, Amazon, Facebook. They all rely on caching to handle billions of requests.

How Caching Works

The caching process has a simple flow.

A user makes a request.
The app checks the cache.
If the data is in the cache that is called a Cache Hit. Return the data immediately.
If the data is not in the cache that is called a Cache Miss. Go to the database, get the data, store it in the cache for next time, then return it.

graph TD
    A[User Request] --> B{Check Cache}
    B -->|Hit| C[Return Data from Cache]
    B -->|Miss| D[Fetch from Database]
    D --> E[Store in Cache]
    E --> F[Return Data to User]
 
    class B mm-yellow
    class C mm-green
    class D mm-blue
    class E mm-blue

Cache Hit means the data was found in the cache. This is what we want. It is fast.

Cache Miss means the data was not in the cache. The app has to go to the database. This is slow but the data gets cached for next time.

Cache Hit Ratio is the percentage of requests served from the cache. A good cache hit ratio is above 90%. If your hit ratio is 95% that means 95 out of every 100 requests never touch the database.

What is TTL

TTL stands for Time To Live. It tells the cache how long to keep the data before throwing it away.

You set a TTL when you store data in the cache. After that time passes the data is automatically deleted. The next request will go to the database and get fresh data.

Why do we need TTL? Because data changes. If someone updates their Instagram bio but the cache keeps the old bio forever users will see outdated information.

TTL examples

User profile data: TTL of 5 minutes
Product prices: TTL of 1 minute
Static page content: TTL of 1 hour
Stock prices: TTL of 5 seconds

The right TTL depends on how often the data changes and how critical it is to show the latest version.

Types of Caching

There are two main types of cache based on where the cache lives.

In Memory Cache (Local Cache)

The cache lives in the same server as your application. It is stored in the server RAM.

This is the fastest type of cache because there is no network call. The data is right there in the same process memory.

But it has a big limitation. The cache is not shared. If you have 3 app servers each server has its own separate cache. Server 1 might have user data cached but Server 2 does not. So Server 2 still hits the database.

graph TD
    subgraph Server 1
        A1[App] --> LC1[Local Cache]
    end
    subgraph Server 2
        A2[App] --> LC2[Local Cache]
    end
    subgraph Server 3
        A3[App] --> LC3[Local Cache]
    end
    LC1 -.-> DB[Database]
    LC2 -.-> DB
    LC3 -.-> DB
 
    class LC1 mm-green
    class LC2 mm-green
    class LC3 mm-green
    class DB mm-blue

Examples of in memory cache

HashMap in Java
LRU Cache
MemoryCache in Android
Caffeine Cache in Java
Python dict used as cache

Use this when your app runs on a single server or when the cached data is specific to each server.

Distributed Cache

The cache lives on a separate dedicated server. All your app servers connect to this shared cache over the network.

This means Server 1, Server 2 and Server 3 all share the same cache. When Server 1 caches user data Server 2 can read it too.

graph TD
    A1[App Server 1] --> RC[Distributed Cache Server]
    A2[App Server 2] --> RC
    A3[App Server 3] --> RC
    RC -.-> DB[Database]
 
    class RC mm-red
    class A1 mm-blue
    class A2 mm-blue
    class A3 mm-blue
    class DB mm-yellow

Examples of distributed cache

Redis the most popular choice
Memcached simple and fast
Hazelcast for Java applications
Amazon ElastiCache (managed Redis or Memcached on AWS)

Advantages of distributed cache

Shared across all servers. One cache for everyone.
Scalable. You can add more cache servers as traffic grows.
Survives server restarts. If App Server 1 crashes the cache is still there.
Reliable. Most distributed caches support replication so even the cache has backups.

Use this when your app has multiple servers and you need a shared cache. This is what most production systems use.

If you are running your app on AWS you can use Amazon ElastiCache which gives you managed Redis or Memcached. If you are interested in AWS fundamentals check out our AWS Zero to Hero guide.

Redis vs Memcached

These are the two most popular distributed cache solutions. Here is when to use each.

Feature	Redis	Memcached
Data types	Strings lists sets sorted sets hashes	Strings only
Persistence	Yes. Data survives restarts	No. Data is lost on restart
Replication	Yes. Primary and replica support	No built in replication
Threading	Single threaded	Multi threaded
Memory management	Advanced with eviction policies	Simple slab allocation
Use case	Complex caching sessions leaderboards queues	Simple key value caching

Use Redis when you need more than simple key value storage. Use Memcached when you need raw speed for simple data and nothing else.

Cache Invalidation

This is the hardest part of caching. There is a famous saying in computer science.

There are only two hard things in computer science. Cache invalidation and naming things.

Cache invalidation means removing or updating old data from the cache when the original data changes.

If a user updates their profile picture but the cache still has the old picture every visitor sees the old one. That is stale data. Cache invalidation prevents this.

When to Invalidate

When data is updated like a new post or changed bio
When follower or following count changes
When user information is changed
When TTL expires
When you manually trigger an invalidation

graph LR
    A[Data Changes in DB] --> B[Invalidate Cache]
    B --> C[Next Request Fetches Fresh Data]
    C --> D[Store New Data in Cache]
 
    class A mm-red
    class B mm-yellow
    class C mm-blue
    class D mm-green

Caching Strategies

There are four main strategies for how your app reads and writes data with a cache.

1 Cache Aside (Lazy Loading)

This is the most common pattern. The app manages the cache itself.

Read flow. The app checks the cache first. If the data is there return it. If not go to the database, get the data, store it in the cache and then return it.

Write flow. The app writes to the database directly. Then it deletes the cached version so the next read gets fresh data.

graph TD
    A[App] -->|1 Check Cache| B{Cache Hit?}
    B -->|Yes| C[Return Cached Data]
    B -->|No| D[2 Query Database]
    D --> E[3 Store in Cache]
    E --> F[4 Return Data]
 
    class B mm-yellow
    class C mm-green
    class D mm-blue

This is simple and works well. The downside is the first request for any data will always be slow because it is a cache miss.

Most web apps use this pattern. If you have built a URL shortener on AWS you would use cache aside for the URL lookups.

2 Write Through Cache

Every write goes to both the cache and the database at the same time. So the cache is always up to date.

Read flow. Always read from the cache. The data will always be fresh.

Write flow. Write to the cache first. The cache then writes to the database.

Advantage: The cache and database are always in sync. No stale data.

Disadvantage: Every write is slower because it has to write to two places. Also you might cache data that nobody ever reads.

Use this when you cannot afford stale data. Banking apps use this.

3 Write Back (Write Behind) Cache

The app writes to the cache only. The cache then writes to the database in the background later.

Read flow. Read from the cache.

Write flow. Write to the cache. The cache batches multiple writes and sends them to the database later.

Advantage: Very fast writes because the app only talks to the cache.

Disadvantage: If the cache crashes before syncing to the database you lose data.

Use this for high write volume systems where some data loss is acceptable. Analytics and logging systems use this.

4 Read Through Cache

Similar to cache aside but the cache itself manages the database reads. The app only talks to the cache and never directly to the database.

Read flow. The app asks the cache for data. If the cache does not have it the cache goes to the database, stores the data and returns it.

Write flow. Separate from the read path. You decide how writes work.

Advantage: The app code is simpler because it only talks to the cache.

Disadvantage: The cache needs to know how to query the database which adds complexity to the cache layer.

Strategy Comparison

Strategy	Read Speed	Write Speed	Data Freshness	Complexity	Risk
Cache Aside	Fast on hit	Normal	Can be stale	Low	Stale reads
Write Through	Always fast	Slower	Always fresh	Medium	Wasted cache space
Write Back	Always fast	Very fast	Always fresh in cache	High	Data loss on crash
Read Through	Fast on hit	Normal	Can be stale	Medium	Cache complexity

Where Caching Happens

Caching is not just one thing in one place. It happens at multiple layers of your system.

Browser Cache

Your browser caches CSS, JavaScript, images and fonts locally. When you visit a site again the browser loads those files from disk instead of downloading them again.

You control this with HTTP headers like Cache-Control and ETag.

CDN Cache

A CDN (Content Delivery Network) caches your static content on servers around the world. When a user in India requests your website they get the files from a server in Mumbai instead of your server in the US.

Popular CDNs include Cloudflare, AWS CloudFront and Fastly.

If you have set up a portfolio on AWS you have probably used CloudFront already. Check out our S3 portfolio hosting guide for an example.

Application Cache

This is where most of the caching logic lives. Your application decides what to cache, how long to cache it and when to invalidate it.

Examples include caching user sessions, caching API responses, caching computed results and caching database queries.

Database Cache

Most databases have their own internal cache. MySQL has the query cache. PostgreSQL has shared buffers. MongoDB has the WiredTiger cache.

These are automatic. The database caches frequently accessed data in memory so repeated queries are faster.

Full Caching Layer Diagram

graph TD
    U[User] --> B[Browser Cache]
    B --> CDN[CDN Cache]
    CDN --> LB[Load Balancer]
    LB --> App[Application Cache]
    App --> DC[Distributed Cache Redis]
    DC --> DB[Database Cache]
    DB --> Disk[Database Disk]
 
    class B mm-blue
    class CDN mm-green
    class App mm-yellow
    class DC mm-red
    class DB mm-purple

Each layer catches requests before they reach the next layer. By the time a request reaches the actual database disk it has passed through 5 layers of caching. Most requests never make it past the first two layers.

For a deeper look at how load balancers work in this stack check out our Scalability and Load Balancing guide.

Real World Example: Music Streaming App

Let us put this together with a real example. Imagine you are building a music streaming app like Spotify.

When a premium user opens the app it needs to load

User profile and subscription details
Recently played songs
Personalized playlists
Saved library
Queued songs

Without caching each of these is a separate database query. That is 5 database calls just to open the app. Multiply that by 10 million active users and your database handles 50 million queries just for the home screen.

With caching the flow changes.

First visit. All 5 queries go to the database. The results are stored in cache with a TTL of 5 minutes.

Next visit within 5 minutes. All 5 queries hit the cache. Zero database calls. The app loads in under 100ms.

After 5 minutes. The cache expires. The app goes to the database again, gets fresh data and caches it.

Benefits of Caching

Here is why every production system uses caching.

Faster response time. Cache is in RAM. RAM is 100x faster than disk reads.
Lower database load. Fewer queries hit the database. The database can focus on writes and complex queries.
Saves API calls and network trips. Fewer calls to external services and databases.
Handles more traffic. With cache you can handle 10x more users on the same infrastructure.
Better user experience. Faster page loads and smoother interactions.
Cost effective. Fewer database connections means you can use a smaller cheaper database.

This ties directly into cost optimization strategies for cloud infrastructure. Caching reduces the compute and database resources you need to pay for.

Common Caching Pitfalls

Caching sounds simple but it has traps. Here are the ones that catch people.

Cache Stampede

When a popular cache key expires all the requests hit the database at the same time. If 10000 users are requesting the same data and the cache expires all 10000 requests flood the database at once.

The fix is to use cache locking. Only the first request goes to the database. The rest wait until the cache is filled again.

Cache Penetration

When requests keep asking for data that does not exist in the database. The cache will never have it so every request goes to the database.

The fix is to cache the "not found" result too. Store a null or empty value with a short TTL.

Cache Avalanche

When a large number of cache keys expire at the same time. This causes a sudden spike of database queries.

The fix is to add random jitter to your TTL values. Instead of all keys expiring at exactly 5 minutes make them expire between 4 and 6 minutes.

Stale Data

When the cache has old data and users see outdated information. This happens when cache invalidation is not done properly.

The fix is to use proper invalidation strategies and set reasonable TTL values. If your system handles critical data like financial transactions consider using write through caching.

Cache Eviction Policies

Cache memory is limited. When the cache is full and new data needs to be stored something has to be removed. The eviction policy decides what gets removed.

Policy	What It Does	When To Use
LRU (Least Recently Used)	Removes the data that was accessed longest ago	Most common. Good default choice
LFU (Least Frequently Used)	Removes the data that is accessed least often	When some data is always popular
FIFO (First In First Out)	Removes the oldest data first	Simple and predictable
TTL Based	Removes data when its timer expires	When data has a natural expiry
Random	Removes random data	When no pattern exists

LRU is the most widely used policy. Redis uses it as the default eviction policy.

Putting It All Together

Here is a complete system architecture with caching at every layer.

graph TD
    Users[Users] --> CDN[CDN Cloudflare]
    CDN --> LB[Load Balancer]
    LB --> App1[App Server 1]
    LB --> App2[App Server 2]
    LB --> App3[App Server 3]
 
    App1 --> Redis[Redis Cache Cluster]
    App2 --> Redis
    App3 --> Redis
 
    Redis --> PDB[Primary Database]
    PDB -->|Replication| RDB[Read Replica]
 
    class CDN mm-green
    class LB mm-red
    class Redis mm-red
    class PDB mm-purple
    class RDB mm-teal

The flow works like this.

Users hit the CDN. Static files like images CSS and JS are served from the edge. Most requests stop here.
Dynamic requests go through the load balancer which distributes traffic across app servers.
App servers check Redis first. If the data is cached it returns immediately.
On a cache miss the app queries the primary database or read replica.
The result is cached in Redis for future requests.
Database replication keeps the read replica in sync for disaster recovery.

This is the setup most production web apps use on the internet today.

Common Use Cases for Caching

Use Case	What Gets Cached	Example
User profiles	Name, bio, settings	Social media apps
Feed and timeline	Posts, tweets, stories	Instagram, Twitter
Session data	Login tokens, preferences	Any web app
Product catalog	Prices, descriptions, images	E-commerce sites
Leaderboards	Scores, rankings	Gaming apps
Config and feature flags	App settings, A/B tests	SaaS platforms
DNS lookups	Domain to IP mapping	Every internet request
API responses	External API results	Payment gateways, maps

Core Concepts

Here is everything in one place.

Concept	What It Means	Key Point
Caching	Storing data in fast temporary storage	Reduces database load and speeds up responses
Cache Hit	Data found in cache	The goal. Fast response
Cache Miss	Data not in cache	Slower. Goes to database
TTL	Time To Live for cached data	Controls how long data stays in cache
In Memory Cache	Cache on the same server	Fastest but not shared
Distributed Cache	Cache on separate servers like Redis	Shared across all app servers
Cache Aside	App manages cache reads and writes	Most common pattern
Write Through	Write to cache and database together	Always fresh but slower writes
Write Back	Write to cache first and database later	Fast writes but risk data loss
Cache Invalidation	Removing or updating stale data	The hardest problem in caching
Cache Stampede	Many requests hit database when cache expires	Fix with locking or pre warming
LRU Eviction	Remove least recently used data first	Best default eviction policy
CDN	Cache static content at the network edge	Serves users from nearest server

Caching is not something you add at the end. It is a core part of system design that you plan from the start. Get it right and your app handles 10x the traffic on the same hardware. Get it wrong and your users see stale data while your database melts.

If you are learning system design start with Scalability and Load Balancing then come here for caching. Next up read our complete guide on Databases, Message Queues and Authentication.

Keep Reading

Subscribe to Newsletter

Get the latest posts delivered right to your inbox

Join 1,000+ readers. No spam, unsubscribe anytime.

Support my work — Brewing thought

Ranti

Rantideb Howlader

Author

Connect