[Pdns-users] Spikey response times in powerdns recursor

Simon Bedford sbedford at plus.net
Wed Mar 17 11:37:24 UTC 2010


bert hubert wrote:
> On Wed, Mar 17, 2010 at 11:16:40AM +0000, Simon Bedford wrote:
> 
>>> Mar 17 11:57:02 [5] bbc.co.uk.: Resolved 'bbc.co.uk.' NS ns1.bbc.co.uk. to: 132.185.132.21
>>> Mar 17 11:57:02 [5] bbc.co.uk.: Trying IP 132.185.132.21:53, asking 'bbc.co.uk.|A'
>>> Mar 17 11:57:04 [5] bbc.co.uk.: timeout resolving
>> It never used to happen before 23/12/09 though looking at our graphs
>> and as you say this happens for at least 5 domain names that we
>> monitor (some of which are our own and some external).
> 
> bbc.co.uk still has a nameserver that is down, so having that domain resolve
> slowly every once in a while is to be expected.

Agreed, there is the point that this does not happen in the day though 
and although the cache will be busier it will still have to go get the 
short lived TTL entry every x minutes in the day as well unless I am 
missing something.

> 
> You've indicated you've occasionally seen 500ms lookups times for
> google.com, but I have not heard of any other problems.
> 
> google.com takes between 0 and 100ms to resolve in my tests.

hmmm, we definitely see higher as we do with our own domains overnight 
as well.

> 
>>> This matches the expectations.
>> Doubling overnight and acceptable to have 2 second look up times??
>> This is definitely not something that would be acceptable to our
>> customers for valid domains...
> 
> These measurements from dnsscope are for _all_ domain names, not just valid
> domains. Please do not think that I recommend 2 second lookup times.
>
> The reality is that a huge number of domains have unresponsive nameservers.
> Your graph indicates that 1% of queries takes between 1024 and 2048 msec to
> resolve at night, and this is entirely to be expected.
> 
> A doubling of *average* response times, but still in the <40ms range, is
> entirely to be expected on a server that is relatively idle at night.
>

I realise dnsscope takes all domains into account and agree that the 
average stats are pretty darn good.  But we do see multiple second look 
up times from the recursor for domains we host and run the auth DNS for 
without seeing those spikes on our auth DNS graphing.

>         Bert

This is what is causing the mystery for me, when its good its really 
good but then response times go crazy at a random time, its dropped our 
customer experience graphing from 99.987% to 89% (some of this will be 
the 3.1.7.2 cache maintenance bug though, in fact a larger proportion as 
we only have 1 of 4 upgraded to 3.2.1).

Simon



More information about the Pdns-users mailing list