The Dead Upstream Trick: Forcing Nginx to Prefer Stale Cache
Nginx is designed as a “freshness-first” proxy. If a cache entry is expired, Nginx locks it and demands a refresh from the upstream immediately.
But in high-scale systems, we often face a different requirement: Speed over Freshness.
If generating a dashboard takes 5 seconds, I don’t want the user waiting just because the cache expired 1 second ago. I want to serve the 10-minute-old data now and refresh it in the background (or let the next poor soul trigger the update).
Nginx doesn’t have a native prefer_stale flag for this specific flow. To achieve it, we use the “Dead Upstream Trick.”
The Mechanics
The logic is counter-intuitive:
- We route the request to an upstream where all servers are marked
down. - Nginx immediately sees “no live upstreams available” and throws an error.
- This error triggers
proxy_cache_use_stale error, forcing Nginx to serve whatever is in the cache, no matter how old. - If the cache is empty, we catch the resulting 502 and fallback to the live backend.
The Configuration
Here is the lean implementation. We skip the boilerplate and focus on the routing engine.
1. The Upstreams
We need our real backend and a “Black Hole.”
upstream backend_live {
server 10.0.0.1:8080;
}
upstream backend_dead {
# Mark the server as permanently down.
# Nginx won't attempt any connection---it immediately fails with "no live upstreams".
# This error triggers proxy_cache_use_stale, serving cached content instantly.
server 127.0.0.1:80 down;
}
2. The Routing Logic
# Global cache zone definition
proxy_cache_path /var/cache/nginx keys_zone=my_cache:10m inactive=24h;
server {
# CRITICAL: Allow error_page redirection to happen inside an error_page
recursive_error_pages on;
location /api/dashboard {
# Entry logic: If we want stale data, trigger the trap (Error 418).
# You can toggle this via header, cookie, or make it default.
if ($http_x_prefer_stale) {
return 418;
}
# Default: Go straight to live source
try_files "" @live;
}
# The Trap: Attempt to serve stale, fail over to live
location @stale_attempt {
# The 418 from above lands us here
error_page 418 = @stale_attempt;
proxy_pass http://backend_dead;
# Cache Config
proxy_cache my_cache;
proxy_cache_key $uri$is_args$cleaned_args;
# 1. Trigger: "no live upstreams" error activates stale cache usage
proxy_cache_use_stale error timeout updating;
# 2. Fallback: If cache is EMPTY, the dead upstream throws 502.
# Catch it and send to live.
proxy_intercept_errors on;
error_page 502 =200 @live;
# Debugging header to confirm source
add_header X-Cache-Status $upstream_cache_status;
}
# The Source of Truth
location @live {
proxy_pass http://backend_live;
# We must cache here so the next request has data to be stale with.
proxy_cache my_cache;
proxy_cache_valid 200 10m;
}
}
Why This Works (And Why It Often Breaks)
The magic component is recursive_error_pages on at the server level.
Without it, Nginx enters “error handling mode” when we hit the 418 redirect. Once in that mode, Nginx normally blocks further error redirections to prevent infinite loops.
Since our “Dead Upstream” strategy relies on triggering a second error (the 502 from the dead upstream) to reach the live backend on a cache miss, we must explicitly enable recursive errors. Without it, your cache misses will just return a 502 Bad Gateway to the client.
Summary
- Upstream: Use
server ... downto create an instantly-failing upstream. - Server: Use
recursive_error_pages onto allow the fallback chain. - Result: Instant stale content availability with a seamless fallback to live generation.