Your database is slow. Not because it is broken. Not because you wrote bad queries. It is slow because you are asking it the same questions a million times a day.
Every time a user opens your app your server goes to the database. It fetches the same profile data. The same product list. The same feed. Over and over and over.
What if you could store those answers somewhere faster? Somewhere that does not need to search through millions of rows every single time?
That is caching.
If you read our guide on Scalability and Load Balancing you learned how to handle more users by adding more servers. Caching is the next step. It makes each server faster so you need fewer servers in the first place.
What is Caching
Caching is storing frequently used data in a fast temporary storage so future requests get the data quickly without hitting the database.
Think of it like this. You are studying for an exam. Your textbook is in the bookshelf across the room. Every time you need a formula you walk to the bookshelf find the page read the formula and walk back.
That is slow.
So you write the most used formulas on a sticky note and put it on your desk. Now when you need a formula you just look at the sticky note. No walking. No searching. Instant.
The sticky note is your cache. The bookshelf is your database.
graph LR
A[Your Desk] -->|Fast| B[Sticky Note Cache]
A -->|Slow| C[Bookshelf Database]
class B mm-green
class C mm-yellowThe cache sits between your application and the database. It stores copies of data that your app needs often. When a request comes in the app checks the cache first. If the data is there it returns it immediately. If not it goes to the database.
Why Do We Need Caching
Let us take a real example. Think about Instagram.
When you open someone's profile the app needs to show their name, profile picture, bio, post count, follower count and following count.
Without caching every single time someone views that profile the app goes to the database. If that person has 10 million followers and 1 million people check their profile in one hour that is 1 million database queries for the exact same data.
The profile did not change. The data is the same. But the database is doing the same work 1 million times.
With caching the app stores the profile data in a fast cache after the first database query. The next 999999 requests get the data from the cache. The database is free to handle other work.
graph TD
subgraph Without Caching
U1[User 1] --> DB1[Database]
U2[User 2] --> DB1
U3[User 3] --> DB1
U4[1M Users] --> DB1
end
class DB1 mm-redgraph TD
subgraph With Caching
U5[User 1] --> C1[Cache]
U6[User 2] --> C1
U7[User 3] --> C1
U8[1M Users] --> C1
C1 -.->|Only 1st time| DB2[Database]
end
class C1 mm-green
class DB2 mm-blueThis is why every large scale system uses caching. Netflix, Google, Amazon, Facebook. They all rely on caching to handle billions of requests.
How Caching Works
The caching process has a simple flow.
- A user makes a request.
- The app checks the cache.
- If the data is in the cache that is called a Cache Hit. Return the data immediately.
- If the data is not in the cache that is called a Cache Miss. Go to the database, get the data, store it in the cache for next time, then return it.
graph TD
A[User Request] --> B{Check Cache}
B -->|Hit| C[Return Data from Cache]
B -->|Miss| D[Fetch from Database]
D --> E[Store in Cache]
E --> F[Return Data to User]
class B mm-yellow
class C mm-green
class D mm-blue
class E mm-blueCache Hit means the data was found in the cache. This is what we want. It is fast.
Cache Miss means the data was not in the cache. The app has to go to the database. This is slow but the data gets cached for next time.
Cache Hit Ratio is the percentage of requests served from the cache. A good cache hit ratio is above 90%. If your hit ratio is 95% that means 95 out of every 100 requests never touch the database.
What is TTL
TTL stands for Time To Live. It tells the cache how long to keep the data before throwing it away.
You set a TTL when you store data in the cache. After that time passes the data is automatically deleted. The next request will go to the database and get fresh data.
Why do we need TTL? Because data changes. If someone updates their Instagram bio but the cache keeps the old bio forever users will see outdated information.
TTL examples
- User profile data: TTL of 5 minutes
- Product prices: TTL of 1 minute
- Static page content: TTL of 1 hour
- Stock prices: TTL of 5 seconds
The right TTL depends on how often the data changes and how critical it is to show the latest version.
Types of Caching
There are two main types of cache based on where the cache lives.
In Memory Cache (Local Cache)
The cache lives in the same server as your application. It is stored in the server RAM.
This is the fastest type of cache because there is no network call. The data is right there in the same process memory.
But it has a big limitation. The cache is not shared. If you have 3 app servers each server has its own separate cache. Server 1 might have user data cached but Server 2 does not. So Server 2 still hits the database.
graph TD
subgraph Server 1
A1[App] --> LC1[Local Cache]
end
subgraph Server 2
A2[App] --> LC2[Local Cache]
end
subgraph Server 3
A3[App] --> LC3[Local Cache]
end
LC1 -.-> DB[Database]
LC2 -.-> DB
LC3 -.-> DB
class LC1 mm-green
class LC2 mm-green
class LC3 mm-green
class DB mm-blueExamples of in memory cache
- HashMap in Java
- LRU Cache
- MemoryCache in Android
- Caffeine Cache in Java
- Python dict used as cache
Use this when your app runs on a single server or when the cached data is specific to each server.
Distributed Cache
The cache lives on a separate dedicated server. All your app servers connect to this shared cache over the network.
This means Server 1, Server 2 and Server 3 all share the same cache. When Server 1 caches user data Server 2 can read it too.
graph TD
A1[App Server 1] --> RC[Distributed Cache Server]
A2[App Server 2] --> RC
A3[App Server 3] --> RC
RC -.-> DB[Database]
class RC mm-red
class A1 mm-blue
class A2 mm-blue
class A3 mm-blue
class DB mm-yellowExamples of distributed cache
- Redis the most popular choice
- Memcached simple and fast
- Hazelcast for Java applications
- Amazon ElastiCache (managed Redis or Memcached on AWS)
Advantages of distributed cache
- Shared across all servers. One cache for everyone.
- Scalable. You can add more cache servers as traffic grows.
- Survives server restarts. If App Server 1 crashes the cache is still there.
- Reliable. Most distributed caches support replication so even the cache has backups.
Use this when your app has multiple servers and you need a shared cache. This is what most production systems use.
If you are running your app on AWS you can use Amazon ElastiCache which gives you managed Redis or Memcached. If you are interested in AWS fundamentals check out our AWS Zero to Hero guide.
Redis vs Memcached
These are the two most popular distributed cache solutions. Here is when to use each.
| Feature | Redis | Memcached |
|---|---|---|
| Data types | Strings lists sets sorted sets hashes | Strings only |
| Persistence | Yes. Data survives restarts | No. Data is lost on restart |
| Replication | Yes. Primary and replica support | No built in replication |
| Threading | Single threaded | Multi threaded |
| Memory management | Advanced with eviction policies | Simple slab allocation |
| Use case | Complex caching sessions leaderboards queues | Simple key value caching |
Use Redis when you need more than simple key value storage. Use Memcached when you need raw speed for simple data and nothing else.
Cache Invalidation
This is the hardest part of caching. There is a famous saying in computer science.
There are only two hard things in computer science. Cache invalidation and naming things.
Cache invalidation means removing or updating old data from the cache when the original data changes.
If a user updates their profile picture but the cache still has the old picture every visitor sees the old one. That is stale data. Cache invalidation prevents this.
When to Invalidate
- When data is updated like a new post or changed bio
- When follower or following count changes
- When user information is changed
- When TTL expires
- When you manually trigger an invalidation
graph LR
A[Data Changes in DB] --> B[Invalidate Cache]
B --> C[Next Request Fetches Fresh Data]
C --> D[Store New Data in Cache]
class A mm-red
class B mm-yellow
class C mm-blue
class D mm-greenCaching Strategies
There are four main strategies for how your app reads and writes data with a cache.
1 Cache Aside (Lazy Loading)
This is the most common pattern. The app manages the cache itself.
Read flow. The app checks the cache first. If the data is there return it. If not go to the database, get the data, store it in the cache and then return it.
Write flow. The app writes to the database directly. Then it deletes the cached version so the next read gets fresh data.
graph TD
A[App] -->|1 Check Cache| B{Cache Hit?}
B -->|Yes| C[Return Cached Data]
B -->|No| D[2 Query Database]
D --> E[3 Store in Cache]
E --> F[4 Return Data]
class B mm-yellow
class C mm-green
class D mm-blueThis is simple and works well. The downside is the first request for any data will always be slow because it is a cache miss.
Most web apps use this pattern. If you have built a URL shortener on AWS you would use cache aside for the URL lookups.
2 Write Through Cache
Every write goes to both the cache and the database at the same time. So the cache is always up to date.
Read flow. Always read from the cache. The data will always be fresh.
Write flow. Write to the cache first. The cache then writes to the database.
Advantage: The cache and database are always in sync. No stale data.
Disadvantage: Every write is slower because it has to write to two places. Also you might cache data that nobody ever reads.
Use this when you cannot afford stale data. Banking apps use this.
3 Write Back (Write Behind) Cache
The app writes to the cache only. The cache then writes to the database in the background later.
Read flow. Read from the cache.
Write flow. Write to the cache. The cache batches multiple writes and sends them to the database later.
Advantage: Very fast writes because the app only talks to the cache.
Disadvantage: If the cache crashes before syncing to the database you lose data.
Use this for high write volume systems where some data loss is acceptable. Analytics and logging systems use this.
4 Read Through Cache
Similar to cache aside but the cache itself manages the database reads. The app only talks to the cache and never directly to the database.
Read flow. The app asks the cache for data. If the cache does not have it the cache goes to the database, stores the data and returns it.
Write flow. Separate from the read path. You decide how writes work.
Advantage: The app code is simpler because it only talks to the cache.
Disadvantage: The cache needs to know how to query the database which adds complexity to the cache layer.
Strategy Comparison
| Strategy | Read Speed | Write Speed | Data Freshness | Complexity | Risk |
|---|---|---|---|---|---|
| Cache Aside | Fast on hit | Normal | Can be stale | Low | Stale reads |
| Write Through | Always fast | Slower | Always fresh | Medium | Wasted cache space |
| Write Back | Always fast | Very fast | Always fresh in cache | High | Data loss on crash |
| Read Through | Fast on hit | Normal | Can be stale | Medium | Cache complexity |
Where Caching Happens
Caching is not just one thing in one place. It happens at multiple layers of your system.
Browser Cache
Your browser caches CSS, JavaScript, images and fonts locally. When you visit a site again the browser loads those files from disk instead of downloading them again.
You control this with HTTP headers like Cache-Control and ETag.
CDN Cache
A CDN (Content Delivery Network) caches your static content on servers around the world. When a user in India requests your website they get the files from a server in Mumbai instead of your server in the US.
Popular CDNs include Cloudflare, AWS CloudFront and Fastly.
If you have set up a portfolio on AWS you have probably used CloudFront already. Check out our S3 portfolio hosting guide for an example.
Application Cache
This is where most of the caching logic lives. Your application decides what to cache, how long to cache it and when to invalidate it.
Examples include caching user sessions, caching API responses, caching computed results and caching database queries.
Database Cache
Most databases have their own internal cache. MySQL has the query cache. PostgreSQL has shared buffers. MongoDB has the WiredTiger cache.
These are automatic. The database caches frequently accessed data in memory so repeated queries are faster.
Full Caching Layer Diagram
graph TD
U[User] --> B[Browser Cache]
B --> CDN[CDN Cache]
CDN --> LB[Load Balancer]
LB --> App[Application Cache]
App --> DC[Distributed Cache Redis]
DC --> DB[Database Cache]
DB --> Disk[Database Disk]
class B mm-blue
class CDN mm-green
class App mm-yellow
class DC mm-red
class DB mm-purpleEach layer catches requests before they reach the next layer. By the time a request reaches the actual database disk it has passed through 5 layers of caching. Most requests never make it past the first two layers.
For a deeper look at how load balancers work in this stack check out our Scalability and Load Balancing guide.
Real World Example: Music Streaming App
Let us put this together with a real example. Imagine you are building a music streaming app like Spotify.
When a premium user opens the app it needs to load
- User profile and subscription details
- Recently played songs
- Personalized playlists
- Saved library
- Queued songs
Without caching each of these is a separate database query. That is 5 database calls just to open the app. Multiply that by 10 million active users and your database handles 50 million queries just for the home screen.
With caching the flow changes.
First visit. All 5 queries go to the database. The results are stored in cache with a TTL of 5 minutes.
Next visit within 5 minutes. All 5 queries hit the cache. Zero database calls. The app loads in under 100ms.
After 5 minutes. The cache expires. The app goes to the database again, gets fresh data and caches it.
Benefits of Caching
Here is why every production system uses caching.
- Faster response time. Cache is in RAM. RAM is 100x faster than disk reads.
- Lower database load. Fewer queries hit the database. The database can focus on writes and complex queries.
- Saves API calls and network trips. Fewer calls to external services and databases.
- Handles more traffic. With cache you can handle 10x more users on the same infrastructure.
- Better user experience. Faster page loads and smoother interactions.
- Cost effective. Fewer database connections means you can use a smaller cheaper database.
This ties directly into cost optimization strategies for cloud infrastructure. Caching reduces the compute and database resources you need to pay for.
Common Caching Pitfalls
Caching sounds simple but it has traps. Here are the ones that catch people.
Cache Stampede
When a popular cache key expires all the requests hit the database at the same time. If 10000 users are requesting the same data and the cache expires all 10000 requests flood the database at once.
The fix is to use cache locking. Only the first request goes to the database. The rest wait until the cache is filled again.
Cache Penetration
When requests keep asking for data that does not exist in the database. The cache will never have it so every request goes to the database.
The fix is to cache the "not found" result too. Store a null or empty value with a short TTL.
Cache Avalanche
When a large number of cache keys expire at the same time. This causes a sudden spike of database queries.
The fix is to add random jitter to your TTL values. Instead of all keys expiring at exactly 5 minutes make them expire between 4 and 6 minutes.
Stale Data
When the cache has old data and users see outdated information. This happens when cache invalidation is not done properly.
The fix is to use proper invalidation strategies and set reasonable TTL values. If your system handles critical data like financial transactions consider using write through caching.
Cache Eviction Policies
Cache memory is limited. When the cache is full and new data needs to be stored something has to be removed. The eviction policy decides what gets removed.
| Policy | What It Does | When To Use |
|---|---|---|
| LRU (Least Recently Used) | Removes the data that was accessed longest ago | Most common. Good default choice |
| LFU (Least Frequently Used) | Removes the data that is accessed least often | When some data is always popular |
| FIFO (First In First Out) | Removes the oldest data first | Simple and predictable |
| TTL Based | Removes data when its timer expires | When data has a natural expiry |
| Random | Removes random data | When no pattern exists |
LRU is the most widely used policy. Redis uses it as the default eviction policy.
Putting It All Together
Here is a complete system architecture with caching at every layer.
graph TD
Users[Users] --> CDN[CDN Cloudflare]
CDN --> LB[Load Balancer]
LB --> App1[App Server 1]
LB --> App2[App Server 2]
LB --> App3[App Server 3]
App1 --> Redis[Redis Cache Cluster]
App2 --> Redis
App3 --> Redis
Redis --> PDB[Primary Database]
PDB -->|Replication| RDB[Read Replica]
class CDN mm-green
class LB mm-red
class Redis mm-red
class PDB mm-purple
class RDB mm-tealThe flow works like this.
- Users hit the CDN. Static files like images CSS and JS are served from the edge. Most requests stop here.
- Dynamic requests go through the load balancer which distributes traffic across app servers.
- App servers check Redis first. If the data is cached it returns immediately.
- On a cache miss the app queries the primary database or read replica.
- The result is cached in Redis for future requests.
- Database replication keeps the read replica in sync for disaster recovery.
This is the setup most production web apps use on the internet today.
Common Use Cases for Caching
| Use Case | What Gets Cached | Example |
|---|---|---|
| User profiles | Name, bio, settings | Social media apps |
| Feed and timeline | Posts, tweets, stories | Instagram, Twitter |
| Session data | Login tokens, preferences | Any web app |
| Product catalog | Prices, descriptions, images | E-commerce sites |
| Leaderboards | Scores, rankings | Gaming apps |
| Config and feature flags | App settings, A/B tests | SaaS platforms |
| DNS lookups | Domain to IP mapping | Every internet request |
| API responses | External API results | Payment gateways, maps |
Core Concepts
Here is everything in one place.
| Concept | What It Means | Key Point |
|---|---|---|
| Caching | Storing data in fast temporary storage | Reduces database load and speeds up responses |
| Cache Hit | Data found in cache | The goal. Fast response |
| Cache Miss | Data not in cache | Slower. Goes to database |
| TTL | Time To Live for cached data | Controls how long data stays in cache |
| In Memory Cache | Cache on the same server | Fastest but not shared |
| Distributed Cache | Cache on separate servers like Redis | Shared across all app servers |
| Cache Aside | App manages cache reads and writes | Most common pattern |
| Write Through | Write to cache and database together | Always fresh but slower writes |
| Write Back | Write to cache first and database later | Fast writes but risk data loss |
| Cache Invalidation | Removing or updating stale data | The hardest problem in caching |
| Cache Stampede | Many requests hit database when cache expires | Fix with locking or pre warming |
| LRU Eviction | Remove least recently used data first | Best default eviction policy |
| CDN | Cache static content at the network edge | Serves users from nearest server |
Caching is not something you add at the end. It is a core part of system design that you plan from the start. Get it right and your app handles 10x the traffic on the same hardware. Get it wrong and your users see stale data while your database melts.
If you are learning system design start with Scalability and Load Balancing then come here for caching. Next up read our complete guide on Databases, Message Queues and Authentication.