How I Solved a High Varnish Miss Rate and Defeated Scrapers for $0
August 14, 2025

How I Solved a High Varnish Miss Rate and Defeated Scrapers for $0
Struggling with a high cache miss rate on Varnish? Here’s how I tackled this classic problem on a Drupal backend hammered by aggressive scrapers—without spending a cent, and without blocking legitimate bots like Googlebot or Bingbot.
The Strategy: A 4-Step Surgical Strike
1. URL Categorization in VCL
Categorize URLs to apply targeted rules:
sub vcl_recv {
if (req.url ~ "^/$") {
set req.http.X-Page-Type = "HOME";
} elsif (req.url ~ "^/category" || req.url ~ "^/search") {
set req.http.X-Page-Type = "LISTING";
} elsif (req.url ~ "^/article") {
set req.http.X-Page-Type = "ARTICLE";
} elsif (req.url ~ "\\.(css|js|png|jpg|gif|svg)$") {
set req.http.X-Page-Type = "STATIC_FILES";
}
# ... rest of your logic
}
2. The VIP List: Whitelisting Good Bots
Create a Varnish ACL for allowed bots. Automate the update with a Bash script and cron.
Bash Script: update-bot-acl.sh
#!/bin/bash
set -e
ACL_FILE="/etc/varnish/botallowed.acl"
TMP_FILE="/tmp/botallowed.acl.new"
# Fetch Googlebot and Bingbot IPs
curl -s https://developers.google.com/search/apis/ipranges/googlebot.json | jq -r '.prefixes[].ipv4Prefix' > /tmp/googlebot_ips.txt
curl -s https://www.bing.com/toolbox/bingbot.json | jq -r '.prefixes[].ipv4Prefix' > /tmp/bingbot_ips.txt
# Format for Varnish ACL
{
echo "acl botallowed {"
cat /tmp/googlebot_ips.txt /tmp/bingbot_ips.txt | grep -v '^$' | awk '{print " "$1 ";"}'
echo "}"
} > "$TMP_FILE"
# Update only if changed
if ! cmp -s "$TMP_FILE" "$ACL_FILE"; then
mv "$TMP_FILE" "$ACL_FILE"
systemctl reload varnish
fi
rm -f /tmp/googlebot_ips.txt /tmp/bingbot_ips.txt "$TMP_FILE"
Add to crontab (runs nightly):
0 3 * * * /path/to/update-bot-acl.sh
3. Smart Rate Limiting with vsthrottle
Add to your VCL (requires vsthrottle VMOD):
import vsthrottle;
# Include the ACL
include "/etc/varnish/botallowed.acl";
sub vcl_recv {
# ... URL categorization as above
if (req.http.X-Page-Type ~ "HOME|LISTING|ARTICLE") {
if (!client.ip ~ botallowed) {
if (vsthrottle.is_denied(client.ip, 10, 4s, 60s)) {
return (synth(429, "Too Many Requests"));
}
}
}
}
4. Results
- ✅ Varnish hit-rate skyrocketed
- ✅ Backend CPU load plummeted
- ✅ Real users get a fast, stable site
- ✅ Total cost: $0
Feel free to copy, adapt, and use these scripts and configs!