[Pdns-dev] PDNS Recursor functionality request re:SERVFAIL outages of today

John Todd jtodd at loligo.com
Fri Oct 21 23:53:35 UTC 2016

As most of you know by now, today’s DynDNS outage due to DDoS attack 
caused fairly widespread outages across a large number of domains. 
Authoritative resolvers seem to be a particularly interesting target for 
attackers as they are often smaller in scope (IP address range, transit 
size of authoritative resolver networks) than a full service offering by 
a provider of multiple other services like HTTP. It seems that there may 
be some reasonable ways to respond to outages like this which at a 
minimum will result in failures that are less “bad” than having no 
replies at all, and which can be implemented by DNS recursors.

I’d like to propose an extension to PowerDNS Recursor for mitigating 
(partially) events like we had today where major authoritative 
nameservers were put out of commission. This might be a particularly 
foolish or error-prone method - it only took me a few minutes to think 
up. But I’d at least like to hear a discussion as to why this isn’t 
a good idea. The comment of “But this might end up giving out the 
wrong answer!” is true, but I view a wrong answer as better than no 
answer. What would a domain operator USUALLY want to get? They’d want 
to get the inbound connection, rather than having users completely 
offline. This seems to be particularly valuable for TLD and other 
low-churn zones which may come under attack for various political 
reasons but which contain a significant number of NS records.

Having done plenty of OSS work, I’m sure the next comment will be 
“patches welcome.” ;-) I would be happy to pay some small amount of 
dollars to someone to write this, but I have little budget, high hopes, 
and no coders on staff at this level yet otherwise I would do just that.

PowerDNS Recursor proposed feature extensions:

* Integer
* Default: 180

The recursor keeps all records for this amount of seconds after TTL 
expiration. If the authoritative-provided TTL has expired, then lookup 
is performed on the query in a normal way. If that query fails due to a 
SERVFAIL, then the TTL timer on this “old” record is set back to 
zero and the “old” record is provided as a response. If an 
authoritative server is marked as “down” due to repeated SERVFAIL 
responses (see packetcache-servfail-ttl) then the “old” record is 
handed back immediately without a new query attempt, and the TTL timer 
is set back to zero to keep the answer in a state of perpetual validity 
as long as there are active queries occurring within the 
servfail-ttl-override interval and the authoritative server is resulting 
in SERVFAIL. (packetcache-servfail-ttl is on a rotating timer, and will 
try every X seconds, leading to one single query getting delays during 
the next attempt cycle - other queries are immediately replied to with 
the “old” answer.) An NXDOMAIN response from an authoritative server 
clears “old” records in memory immediately.
This timer method is useful in situations where authoritative 
nameservers are being DDoS’ed and cannot provide responses, with the 
intent that some answer is better than no answer. If a domain operator 
wishes to stop traffic to their site, then replies with NXDOMAIN negate 
this behavior. Only a nameserver being unreachable will result in this 
cache being used as a last resort, and there is a timer for maximum 
duration of these old records being kept. Setting this value low will 
mean that highly-traffic’ed websites will typically always reply with 
a result even if the authoritative nameservers are unreachable due to 
attack or network disconnect, but less often-queried domains may be 
removed from the cache leading to query failures. Setting this value 
high may lead to unexpected results for infrequently-used domains which 
have dynamic results.

* Domains, comma separated

List of domains on which we never use the servfail-TTL-override method

* IP addresses, comma separated

List of authoritative servers on which we never use the 
servfail-TTL-override method


More information about the Pdns-dev mailing list