The Global Tarpit: How to Stop 100k Cloud VPS Bots with Nginx and 128 Bytes of RAM

2025-12-31

If you run a high-value target on the internet today, you know that IP-based rate limiting is dead.

The economics of scraping have changed. Attackers no longer run scripts from a single server. They use massive distributed networks—ranging from “residential proxies” to swarms of cheap Cloud VPS instances.

I recently faced a scraper hitting my infrastructure with 100,000 unique IP addresses per day. These weren’t just infected IoT devices; they were fresh IPs from low-cost VPS providers, rotated constantly. Each IP made only one request. My standard limit_req_zone $binary_remote_addr configuration was useless. To Nginx, every request looked like a new user.

I didn’t want to play whack-a-mole with ASN blocks. I wanted to nuke the entire operation.

Here is how I used Nginx’s TLS fingerprinting and the physics of the Leaky Bucket algorithm to create a “Global Tarpit” that effectively broke the site for the botnet, regardless of how many IPs they rotated.

The Insight: You Can Change Your IP, But You Can’t Hide Your Handshake

While the attacker can rotate IPs for every request, they usually cannot rotate their SSL/TLS stack.

Most scrapers are built on standard libraries: Python’s requests, standard curl, or OpenSSL. Unlike modern browsers (Chrome, Firefox, Safari), these libraries have very specific, static behaviors during the TLS Client Hello:

No GREASE: Browsers send random “GREASE” values (reserved ciphers/extensions like 0x0a0a) to prevent ossification. curl and OpenSSL usually do not.

Zombie Curves: Many older OpenSSL versions advertise support for secp521r1, a curve that Chrome and Firefox abandoned years ago.

If we can detect these signatures, we can identify “Standard OpenSSL” clients.

The Strategy: The Shared Bucket

The standard way to rate limit in Nginx is by IP:

# The Old Way (Useless against proxies)
limit_req_zone $binary_remote_addr zone=one:10m rate=1r/s;

This creates a bucket per user.

My approach was different. I realized that since 100,000 bots were all using the same curl binary, they all produced the exact same TLS fingerprint.

Instead of creating a bucket for every IP, I created one single bucket for that specific fingerprint.

If you look like curl, you don’t get your own personal rate limit. You share a rate limit with every other curl user on the planet.

The Configuration

Here is the Nginx config to classify the traffic. We use the ngx_http_ssl_module variables to inspect the handshake.

http {
    # 1. Detect Zombie Curves (secp521r1 is dead in modern browsers)
    map $ssl_curves $has_zombie_curve {
        default 0;
        "~secp521r1" 1;
    }

    # 2. Detect GREASE (Browsers use it; Curl/OpenSSL usually doesn't)
    map $ssl_curves $has_grease {
        default 0;
        "~0x[0-9a-f]a[0-9a-f]a" 1;
    }

    # 3. The Global Fingerprint Key
    # If the client looks like a browser, we return empty string (no limit).
    # If the client looks like curl, we return the TLS signature.
    map $connection_requests$has_grease$has_zombie_curve $global_fingerprint_key {
        # Logic: 1st Request AND (No Grease OR Has Zombie)
        "~^10"   "$ssl_protocol:$ssl_ciphers:$ssl_curves";
        "~^1.1"  "$ssl_protocol:$ssl_ciphers:$ssl_curves";
        default  "";
    }

If the map returns an empty string, Nginx skips rate limiting. Real users are unaffected. If the map returns a string (the fingerprint), the request goes into the bucket.

The Mechanics of the Trap

This is where we turn a rate limit into a denial of service.

    # Rate = 1 request per minute (The bucket drains slowly)
    limit_req_zone $global_fingerprint_key zone=scrape_trap:10m rate=1r/m;

    server {
        location / {
            # Burst = 64, Delay = 1
            limit_req zone=scrape_trap burst=64 delay=1;

            # If rejected, close connection immediately (Status 444)
            limit_req_status 444;

            proxy_pass http://backend;
        }
    }
}

Why burst=64 delay=1?

We are exploiting Nginx’s implementation of the Leaky Bucket algorithm.

The Queue Fills Instantly: When the botnet wakes up, the first 65 requests arrive.

Request #1 passes.
Requests #2 through #65 are caught in the burst queue.

The Delay Kicks In: Because of delay=1, Nginx attempts to space these requests out to meet the 1r/m rate.

Request #2 is told to wait 60 seconds.
Request #3 is told to wait 120 seconds.
Request #64 is told to wait over an hour.

The Timeout: Scrapers are impatient. They typically timeout after 10-15 seconds. The bots for requests #2-#65 disconnect.

The Jam: This is the crucial part. Even though the clients disconnected, the “bucket” in Nginx’s memory remains full. It drains at a fixed speed of 1 drop per minute.

The result: The bucket is now jammed full for the next hour.

Any subsequent request from any IP matching this fingerprint (Bot #66, Bot #1000, Bot #100,000) arrives at a full bucket. Nginx immediately rejects it with Status 444 (Connection Closed).

The Psychological Effect

By using Status 444 (No Response) and inducing timeouts via delay, we are not telling the scraper “You are blocked.” We are telling them “This site is unstable.”

Bots #2-#65: Experience a connection hang, then a timeout.
Bots #66+: Experience an immediate TCP connection reset.

To a scraper developer debugging their script, this looks like a failing proxy node or a crashing backend, not a WAF rule. They waste time checking their network instead of trying to bypass your filter.

A Warning on Collateral Damage

This is a nuclear option.

Because the fingerprint key is shared, all clients with this signature share the single 1r/m allowance. If a legitimate user tries to curl your API while the botnet is active, they will be blocked immediately.

This configuration is strictly for sites that are intended for web browsers only, where curl or python-requests traffic is inherently unwelcome.

But if you are tired of playing IP whack-a-mole with a massive Cloud VPS swarm, this 128-byte zone entry is the most efficient firewall you will ever deploy.