The problem: mitigation that makes things worse
Automated DDoS mitigation works by applying rules (BGP FlowSpec, RTBH, firewall filters) in response to detected attacks. Most of the time, these rules correctly block attack traffic. But sometimes they also block legitimate traffic:
- A FlowSpec rule that rate-limits UDP port 53 to stop a DNS amplification attack also rate-limits legitimate DNS responses from the target's authoritative nameserver
- An RTBH announcement that blackholes a /32 takes the customer's entire service offline to protect the rest of the network
- A firewall rule that blocks traffic from a specific ASN catches both the attack reflectors and legitimate users who happen to share the same transit provider
- A source-based rate limit catches a CDN origin-pull along with the attack traffic because the CDN's egress IPs generate high PPS
Without rollback, these rules stay active until a human notices the collateral damage, logs into the detection system, identifies the offending rule, and manually withdraws it. During that time, the mitigation is causing the same outage the attack would have caused.
The rollback architecture
An effective auto-rollback system has three components: health monitoring, rule correlation, and withdrawal logic.
Component 1: Health monitoring
Before you can detect collateral damage, you need health signals that represent legitimate traffic. These must be independent of the detection system itself to avoid circular dependencies.
- HTTP health checks: Synthetic requests to the protected service. If the service returns errors or becomes unreachable after a mitigation rule is applied, the rule may be too broad.
- TCP connection rate: Monitor successful TCP handshakes to the target. A mitigation rule that blocks attack traffic should not reduce the rate of successful TCP connections from legitimate sources.
- Application metrics: If you have application-level metrics (request rate, error rate, latency), compare pre-mitigation and post-mitigation values. A rule that reduces attack traffic should improve these metrics, not degrade them.
- Legitimate traffic baseline: Track the volume of traffic from known-good sources (monitoring probes, uptime checks, internal services). If this traffic drops after a rule is applied, the rule is catching legitimate traffic.
Component 2: Rule correlation
When a health signal degrades after a mitigation rule is applied, you need to correlate the two events. This requires timestamped logging of every rule application and removal:
2026-05-21T02:14:33Z RULE_APPLIED flowspec match dst 203.0.113.50/32 proto udp src-port 53 rate-limit 1000pps 2026-05-21T02:14:38Z HEALTH_CHECK 203.0.113.50:80 FAIL timeout after 5s 2026-05-21T02:14:38Z CORRELATION rule_id=fs-7a2b health_degraded=true action=evaluate_rollback
The correlation window matters. If the health check was already failing before the rule was applied (because the attack was causing the failure), rolling back the rule will not help. Only correlate health degradation that begins after rule application.
Component 3: Withdrawal logic
When correlation identifies a rule that may be causing collateral damage, the withdrawal logic decides what to do:
- Narrow first. Before withdrawing entirely, try narrowing the rule. If a rate-limit of 1,000 PPS on UDP/53 is causing problems, try 5,000 PPS. If a source ASN block is too broad, narrow it to specific source prefixes.
- Withdraw and monitor. If narrowing is not possible (RTBH is binary), withdraw the rule and monitor whether the attack traffic returns. If it does, re-apply with a narrower scope or escalate to the next mitigation tier.
- Escalate. If the only effective rule causes collateral damage and the attack is ongoing, escalate to a different mitigation method: from local FlowSpec to upstream RTBH, or from RTBH to cloud scrubbing where more granular filtering is possible.
- Alert. Every rollback should generate an alert. The NOC needs to know that a mitigation rule was applied and then withdrawn, and why.
Implementation patterns
Pattern 1: TTL-based expiry
The simplest rollback mechanism is giving every mitigation rule a time-to-live. If the attack continues, the detection system re-applies the rule. If the attack stops, the rule expires automatically. This prevents stale rules from persisting indefinitely.
# FlowSpec rule with 300-second TTL # Rule expires automatically if not refreshed by detection system flowspec_rule: match: dst 203.0.113.50/32 proto udp src-port 53 action: rate-limit 1000pps ttl: 300 # seconds
Flowtriq's auto-mitigation uses this pattern: mitigation rules are automatically withdrawn when the detection system determines the attack has ended, rather than persisting until manual removal.
Pattern 2: Health-gated application
Before applying a rule, check whether the target is currently healthy. If the service is already down before the rule is applied, the rule cannot make it worse. If the service is up, apply the rule and immediately begin monitoring health. If health degrades within the correlation window, trigger rollback evaluation.
Pattern 3: Staged escalation with rollback at each level
Apply mitigation in stages, with health monitoring at each level. This is the pattern used in Flowtriq's 4-level auto-escalation system:
- Level 1: Local firewall rules. Applied directly on the target node. If health degrades, rollback and escalate to Level 2.
- Level 2: BGP FlowSpec. Applied at the network edge. If FlowSpec rules cause collateral damage, rollback and escalate to Level 3.
- Level 3: RTBH. Applied upstream. If RTBH (which takes the target offline) is the only option, it is applied with a TTL and the system monitors for attack cessation.
- Level 4: Cloud scrubbing. Traffic rerouted to a scrubbing provider for granular filtering. The scrubbing provider handles collateral damage prevention with their own infrastructure.
At each level, the system can roll back to the previous level or skip forward to the next if the current level causes unacceptable collateral damage.
What most detection tools do not provide
Most DDoS detection tools apply mitigation and stop. They fire a FlowSpec rule or RTBH announcement and consider the incident mitigated. They do not:
- Monitor the health of the protected service after rule application
- Correlate health degradation with specific rules
- Automatically narrow or withdraw rules that cause collateral damage
- Provide TTL-based rule expiry to prevent stale rules
This is where the gap between "detection and mitigation" and "detection, mitigation, and safety" exists. The first category is common. The second is what production environments need.
Mitigation with built-in safety
Flowtriq's auto-mitigation includes rule TTLs, automatic withdrawal on attack cessation, multi-level escalation, and health-aware application. $9.99/node/month.
Start Free Trial →Frequently asked questions
What is DDoS mitigation rollback?
How do you detect collateral damage from DDoS mitigation?
The bottom line
Automated mitigation without automated rollback is a loaded gun pointed at your own infrastructure. The same speed that makes automation valuable, sub-second rule application, also means a bad rule causes collateral damage at sub-second speed. Build health monitoring, rule correlation, and withdrawal logic into your mitigation pipeline. Every rule should have a TTL. Every application should be health-gated. And every rollback should generate an alert that your NOC reviews.