[Pdns-users] Spikey response times in powerdns recursor

Simon Bedford sbedford at plus.net
Wed Mar 17 11:16:40 UTC 2010


bert hubert wrote:
> On Wed, Mar 17, 2010 at 10:43:19AM +0000, Simon Bedford wrote:
>> We have been running  recursor as a caching name server for a number
>> of months having moved from unbound, since this time we see good, in
>> fact quick DNS response time but then when running 3.1.7.1 and .2
>> and also 3.2.1 we see random spikes up to 2 seconds for the response
>> times often at the quietest of times for the name servers.
> 
> Versions below 3.2 can indeed sometimes show prolonged delays when running
> with large caches. This issue is solved in 3.2.

Understood

> 
>> I had put this down to 3.1 version after reaading the changelog and
>> bugs fixed in 3.2 but having upgraded we still see the same spiking,
>> this time more frequent over night but not quite as severe as they
>> were.
> 
> As discussed off-list, you see these spikes for a number of domain names, at
> least one of which has a short lived TTL and an unresponsive authoritative
> server.
> 
> Mar 17 11:57:02 [5] bbc.co.uk.: Resolved 'bbc.co.uk.' NS ns1.bbc.co.uk. to: 132.185.132.21
> Mar 17 11:57:02 [5] bbc.co.uk.: Trying IP 132.185.132.21:53, asking 'bbc.co.uk.|A'
> Mar 17 11:57:04 [5] bbc.co.uk.: timeout resolving

It never used to happen before 23/12/09 though looking at our graphs and 
as you say this happens for at least 5 domain names that we monitor 
(some of which are our own and some external).

> 
> bbc.co.uk has a 300 second ttl, and thus expires frequently.
> 
>> I realise that outside lookups will influence the results but its
>> weird that when at their busiest they are more responsive than when
>> its quiet and also have most of the unusual behaviour at that time.
> 
> When servers are busy, your monitoring system is unlikely to encounter
> expired TTLs. This is why a busy server in fact provides superior service
> compared to an idle one.

I did wonder about this and whether that would be the case.

> 
>> Recursor performance graphing and dnsscope stats look OK although
>> the average time to respon goes up by 100% overnight, see sample
>> stats below from overnight/this morning :-
> 
> This matches the expectations.

Doubling overnight and acceptable to have 2 second look up times??  This 
is definitely not something that would be acceptable to our customers 
for valid domains...

> 
>         Bert

Also, my previous post may appear to have been having a dig at the 
support I have received off list from Bert or that this is entirely due 
to the recursor software, far from it, I have been delighted with the 
level of support received up to yet and really want to fix this issue 
and stay with powerdns.  I have sent our config as I wasn't confident 
that we hadn't missed something as well.

I look forward to getting to the bottom of this and being a happy 
powerdns user.

Simon



More information about the Pdns-users mailing list