2025 © Francesco Costantino

How I Solved a High Varnish Miss Rate and Defeated Scrapers for $0

August 14, 2025

How I Solved a High Varnish Miss Rate and Defeated Scrapers for $0

How I Solved a High Varnish Miss Rate and Defeated Scrapers for $0

Struggling with a high cache miss rate on Varnish? Here’s how I tackled this classic problem on a Drupal backend hammered by aggressive scrapers—without spending a cent, and without blocking legitimate bots like Googlebot or Bingbot.

The Strategy: A 4-Step Surgical Strike

1. URL Categorization in VCL

Categorize URLs to apply targeted rules:

sub vcl_recv {
    if (req.url ~ "^/$") {
        set req.http.X-Page-Type = "HOME";
    } elsif (req.url ~ "^/category" || req.url ~ "^/search") {
        set req.http.X-Page-Type = "LISTING";
    } elsif (req.url ~ "^/article") {
        set req.http.X-Page-Type = "ARTICLE";
    } elsif (req.url ~ "\\.(css|js|png|jpg|gif|svg)$") {
        set req.http.X-Page-Type = "STATIC_FILES";
    }
    # ... rest of your logic
}

2. The VIP List: Whitelisting Good Bots

Create a Varnish ACL for allowed bots. Automate the update with a Bash script and cron.

Bash Script: update-bot-acl.sh

#!/bin/bash
set -e

ACL_FILE="/etc/varnish/botallowed.acl"
TMP_FILE="/tmp/botallowed.acl.new"

# Fetch Googlebot and Bingbot IPs
curl -s https://developers.google.com/search/apis/ipranges/googlebot.json | jq -r '.prefixes[].ipv4Prefix' > /tmp/googlebot_ips.txt
curl -s https://www.bing.com/toolbox/bingbot.json | jq -r '.prefixes[].ipv4Prefix' > /tmp/bingbot_ips.txt

# Format for Varnish ACL
{
  echo "acl botallowed {"
  cat /tmp/googlebot_ips.txt /tmp/bingbot_ips.txt | grep -v '^$' | awk '{print "  "$1 ";"}'
  echo "}"
} > "$TMP_FILE"

# Update only if changed
if ! cmp -s "$TMP_FILE" "$ACL_FILE"; then
  mv "$TMP_FILE" "$ACL_FILE"
  systemctl reload varnish
fi

rm -f /tmp/googlebot_ips.txt /tmp/bingbot_ips.txt "$TMP_FILE"

Add to crontab (runs nightly):

0 3 * * * /path/to/update-bot-acl.sh

3. Smart Rate Limiting with vsthrottle

Add to your VCL (requires vsthrottle VMOD):

import vsthrottle;

# Include the ACL
include "/etc/varnish/botallowed.acl";

sub vcl_recv {
    # ... URL categorization as above

    if (req.http.X-Page-Type ~ "HOME|LISTING|ARTICLE") {
        if (!client.ip ~ botallowed) {
            if (vsthrottle.is_denied(client.ip, 10, 4s, 60s)) {
                return (synth(429, "Too Many Requests"));
            }
        }
    }
}

4. Results

  • ✅ Varnish hit-rate skyrocketed
  • ✅ Backend CPU load plummeted
  • ✅ Real users get a fast, stable site
  • ✅ Total cost: $0

Feel free to copy, adapt, and use these scripts and configs!