[Pdns-dev] PDNS Recursor functionality request re:SERVFAIL outages of today

Ferenc Kovacs tyra3l at gmail.com
Sat Oct 22 17:00:54 UTC 2016


2016. okt. 22. 1:54 ezt írta ("John Todd" <jtodd at loligo.com>):
>
>
> As most of you know by now, today’s DynDNS outage due to DDoS attack
caused fairly widespread outages across a large number of domains.
Authoritative resolvers seem to be a particularly interesting target for
attackers as they are often smaller in scope (IP address range, transit
size of authoritative resolver networks) than a full service offering by a
provider of multiple other services like HTTP. It seems that there may be
some reasonable ways to respond to outages like this which at a minimum
will result in failures that are less “bad” than having no replies at all,
and which can be implemented by DNS recursors.
>
> I’d like to propose an extension to PowerDNS Recursor for mitigating
(partially) events like we had today where major authoritative nameservers
were put out of commission. This might be a particularly foolish or
error-prone method - it only took me a few minutes to think up. But I’d at
least like to hear a discussion as to why this isn’t a good idea. The
comment of “But this might end up giving out the wrong answer!” is true,
but I view a wrong answer as better than no answer. What would a domain
operator USUALLY want to get? They’d want to get the inbound connection,
rather than having users completely offline. This seems to be particularly
valuable for TLD and other low-churn zones which may come under attack for
various political reasons but which contain a significant number of NS
records.
>
> Having done plenty of OSS work, I’m sure the next comment will be
“patches welcome.” ;-) I would be happy to pay some small amount of dollars
to someone to write this, but I have little budget, high hopes, and no
coders on staff at this level yet otherwise I would do just that.
>
> PowerDNS Recursor proposed feature extensions:
>
> servfail-ttl-override
> * Integer
> * Default: 180
>
> The recursor keeps all records for this amount of seconds after TTL
expiration. If the authoritative-provided TTL has expired, then lookup is
performed on the query in a normal way. If that query fails due to a
SERVFAIL, then the TTL timer on this “old” record is set back to zero and
the “old” record is provided as a response. If an authoritative server is
marked as “down” due to repeated SERVFAIL responses (see
packetcache-servfail-ttl) then the “old” record is handed back immediately
without a new query attempt, and the TTL timer is set back to zero to keep
the answer in a state of perpetual validity as long as there are active
queries occurring within the servfail-ttl-override interval and the
authoritative server is resulting in SERVFAIL. (packetcache-servfail-ttl is
on a rotating timer, and will try every X seconds, leading to one single
query getting delays during the next attempt cycle - other queries are
immediately replied to with the “old” answer.) An NXDOMAIN response from an
authoritative server clears “old” records in memory immediately.
> This timer method is useful in situations where authoritative nameservers
are being DDoS’ed and cannot provide responses, with the intent that some
answer is better than no answer. If a domain operator wishes to stop
traffic to their site, then replies with NXDOMAIN negate this behavior.
Only a nameserver being unreachable will result in this cache being used as
a last resort, and there is a timer for maximum duration of these old
records being kept. Setting this value low will mean that highly-traffic’ed
websites will typically always reply with a result even if the
authoritative nameservers are unreachable due to attack or network
disconnect, but less often-queried domains may be removed from the cache
leading to query failures. Setting this value high may lead to unexpected
results for infrequently-used domains which have dynamic results.
>
> servfail-ttl-override-domain-exceptions
> * Domains, comma separated
>
> List of domains on which we never use the servfail-TTL-override method
>
> servfail-ttl-override-server-exceptions
> * IP addresses, comma separated
>
> List of authoritative servers on which we never use the
servfail-TTL-override method
>
> JT
>
>
> _______________________________________________
> Pdns-dev mailing list
> Pdns-dev at mailman.powerdns.com
> https://mailman.powerdns.com/mailman/listinfo/pdns-dev

this seems to be similar what the opendns guys implemented as freeze list:
https://indico.dns-oarc.net/event/24/session/8/contribution/13/material/slides/2.pdf
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.powerdns.com/pipermail/pdns-dev/attachments/20161022/2fc0a5d6/attachment.html>


More information about the Pdns-dev mailing list