[Pdns-users] Spikey response times in powerdns recursor
sbedford at plus.net
Wed Mar 17 11:37:24 UTC 2010
bert hubert wrote:
> On Wed, Mar 17, 2010 at 11:16:40AM +0000, Simon Bedford wrote:
>>> Mar 17 11:57:02  bbc.co.uk.: Resolved 'bbc.co.uk.' NS ns1.bbc.co.uk. to: 22.214.171.124
>>> Mar 17 11:57:02  bbc.co.uk.: Trying IP 126.96.36.199:53, asking 'bbc.co.uk.|A'
>>> Mar 17 11:57:04  bbc.co.uk.: timeout resolving
>> It never used to happen before 23/12/09 though looking at our graphs
>> and as you say this happens for at least 5 domain names that we
>> monitor (some of which are our own and some external).
> bbc.co.uk still has a nameserver that is down, so having that domain resolve
> slowly every once in a while is to be expected.
Agreed, there is the point that this does not happen in the day though
and although the cache will be busier it will still have to go get the
short lived TTL entry every x minutes in the day as well unless I am
> You've indicated you've occasionally seen 500ms lookups times for
> google.com, but I have not heard of any other problems.
> google.com takes between 0 and 100ms to resolve in my tests.
hmmm, we definitely see higher as we do with our own domains overnight
>>> This matches the expectations.
>> Doubling overnight and acceptable to have 2 second look up times??
>> This is definitely not something that would be acceptable to our
>> customers for valid domains...
> These measurements from dnsscope are for _all_ domain names, not just valid
> domains. Please do not think that I recommend 2 second lookup times.
> The reality is that a huge number of domains have unresponsive nameservers.
> Your graph indicates that 1% of queries takes between 1024 and 2048 msec to
> resolve at night, and this is entirely to be expected.
> A doubling of *average* response times, but still in the <40ms range, is
> entirely to be expected on a server that is relatively idle at night.
I realise dnsscope takes all domains into account and agree that the
average stats are pretty darn good. But we do see multiple second look
up times from the recursor for domains we host and run the auth DNS for
without seeing those spikes on our auth DNS graphing.
This is what is causing the mystery for me, when its good its really
good but then response times go crazy at a random time, its dropped our
customer experience graphing from 99.987% to 89% (some of this will be
the 188.8.131.52 cache maintenance bug though, in fact a larger proportion as
we only have 1 of 4 upgraded to 3.2.1).
More information about the Pdns-users