The Global Tarpit: How to Stop 100k Cloud VPS Bots with Nginx and 128 Bytes of RAM
If you run a high-value target on the internet today, you know that IP-based rate limiting is dead.
The economics of scraping have changed. Attackers no longer run scripts from a single server. They use massive distributed networks—ranging from “residential proxies” to swarms of cheap Cloud VPS instances.
I recently faced a scraper hitting my infrastructure with 100,000 unique IP addresses per day. These weren’t just infected IoT devices; they were fresh IPs from low-cost VPS providers, rotated constantly. Each IP made only one request. My standard limit_req_zone $binary_remote_addr configuration was useless. To Nginx, every request looked like a new user.
I didn’t want to play whack-a-mole with ASN blocks. I wanted to nuke the entire operation.
Here is how I used Nginx’s TLS fingerprinting and the physics of the Leaky Bucket algorithm to create a “Global Tarpit” that effectively broke the site for the botnet, regardless of how many IPs they rotated.
The Insight: You Can Change Your IP, But You Can’t Hide Your Handshake
While the attacker can rotate IPs for every request, they usually cannot rotate their SSL/TLS stack.
Most scrapers are built on standard libraries: Python’s requests, standard curl, or OpenSSL. Unlike modern browsers (Chrome, Firefox, Safari), these libraries have a very specific, static signature during the TLS Client Hello:
No GREASE: Modern browsers send random “GREASE” values (reserved codepoints like 0x0a0a, 0x1a1a, 0x2a2a) in their cipher and curve lists to prevent protocol ossification. By comparison, curl, python-requests, and stock OpenSSL clients do not.
The trick is that scrapers want to look like Chrome—they set a Chrome User-Agent and call it a day. But faking the TLS handshake itself is much harder. A request that claims to be Chrome but arrives without GREASE is the cheapest reliable bot tell you can get.
If we can detect this mismatch, we can identify scrapers pretending to be Chrome—without touching real Firefox or Safari users at all.
The Strategy: The Shared Bucket
The standard way to rate limit in Nginx is by IP:
# The Old Way (Useless against proxies)
limit_req_zone $binary_remote_addr zone=one:10m rate=1r/s;
This creates a bucket per user.
My approach was different. I realized that since 100,000 bots were all using the same curl binary while claiming to be Chrome, they all produced the exact same TLS fingerprint.
Instead of creating a bucket for every IP, I created one single bucket for that specific fingerprint.
If you claim to be Chrome but look like curl, you don’t get your own personal rate limit. You share a rate limit with every other fake-Chrome client on the planet.
The Configuration
Here is the Nginx config to classify the traffic. We use the ngx_http_ssl_module variables to inspect the handshake.
http {
# 1. Does the User-Agent claim to be Chrome?
map $http_user_agent $claims_chrome {
default 0;
"~Chrome" 1;
}
# 2. Does the handshake actually look like Chrome?
# Real Chrome always advertises a GREASE value (0x0a0a, 0x1a1a, ...) in $ssl_curves.
# curl, python-requests, and stock OpenSSL do not.
map $ssl_curves $has_grease {
default 0;
"~0x[0-9a-f]a[0-9a-f]a" 1;
}
# 3. The Global Fingerprint Key
# Fires only on the first request of a new connection, when the UA claims
# Chrome but the TLS handshake has no GREASE. Real Chrome, real Firefox,
# real Safari, and honest curl users (UA != Chrome) all return empty
# string and are not rate-limited.
map "$connection_requests:$claims_chrome:$has_grease" $global_fingerprint_key {
"~^1:1:0" "$ssl_protocol:$ssl_ciphers:$ssl_curves";
default "";
}
If the map returns an empty string, Nginx skips rate limiting. Real browsers (including Firefox and Safari) are unaffected, and so is anyone whose User-Agent doesn’t claim to be Chrome.
If the map returns a string (the fingerprint), the request goes into the bucket below.
The Mechanics of the Trap
This is where we turn a rate limit into a denial of service.
# Rate = 1 request per minute (The bucket drains slowly)
limit_req_zone $global_fingerprint_key zone=scrape_trap:10m rate=1r/m;
server {
location / {
# Burst = 64, Delay = 1
limit_req zone=scrape_trap burst=64 delay=1;
# If rejected, close connection immediately (Status 444)
limit_req_status 444;
proxy_pass http://backend;
}
}
}
Why burst=64 delay=1?
We are exploiting Nginx’s implementation of the Leaky Bucket algorithm.
The Queue Fills Instantly: When the botnet wakes up, the first 65 requests arrive.
- Request #1 passes.
- Requests #2 through #65 are caught in the burst queue.
The Delay Kicks In: Because of delay=1, Nginx attempts to space these requests out to meet the 1r/m rate.
- Request #2 is told to wait 60 seconds.
- Request #3 is told to wait 120 seconds.
- Request #64 is told to wait over an hour.
The Timeout: Scrapers are impatient. They typically timeout after 10-15 seconds. The bots for requests #2-#65 disconnect.
The Jam: This is the crucial part. Even though the clients disconnected, the “bucket” in Nginx’s memory remains full. It drains at a fixed speed of 1 drop per minute.
The result: The bucket is now jammed full for the next hour.
Any subsequent request from any IP matching this fingerprint (Bot #66, Bot #1000, Bot #100,000) arrives at a full bucket. Nginx immediately rejects it with Status 444 (Connection Closed).
The Psychological Effect
By using Status 444 (No Response) and inducing timeouts via delay, we are not telling the scraper “You are blocked.” We are telling them “This site is unstable.”
- Bots #2-#65: Experience a connection hang, then a timeout.
- Bots #66+: Experience an immediate TCP connection reset.
To a scraper developer debugging their script, this looks like a failing proxy node or a crashing backend, not a WAF rule. They waste time checking their network instead of trying to bypass your filter.
A Warning on Collateral Damage
This is still a sharp instrument, even with the User-Agent gate.
Because the fingerprint key is shared, every client matching it competes for the same 1r/m allowance. The UA-claims-Chrome check protects real Firefox and Safari, but it does not protect:
- Honest scripts that set a Chrome User-Agent. Some monitoring tools, headless health checkers, and developer scripts copy a Chrome UA string out of habit. Under this rule they look indistinguishable from a scraper.
- Headless browsers without GREASE. Older Puppeteer/Playwright builds and some “stealth” forks may match a real Chrome UA but ship a TLS stack that doesn’t grease consistently. Verify against your own traffic before deploying.
- Niche TLS stacks. Embedded clients, exotic TLS libraries, or anything older than the GREASE RFC (RFC 8701) will trip the rule if they advertise a Chrome UA.
A safer deployment path: log $global_fingerprint_key for a few days first, see whose handshake actually lands in it, and only then wire it up to limit_req. The value of this technique is not the rule itself but the shared-bucket-on-fingerprint trick—you decide which fingerprints earn the bucket based on your own traffic, not mine.
But if you are tired of playing IP whack-a-mole with a massive Cloud VPS swarm, this 128-byte zone entry is the most efficient firewall you will ever deploy.