[Pdns-users] Spikey response times in powerdns recursor
Simon Bedford
sbedford at plus.net
Wed Mar 17 11:16:40 UTC 2010
bert hubert wrote:
> On Wed, Mar 17, 2010 at 10:43:19AM +0000, Simon Bedford wrote:
>> We have been running recursor as a caching name server for a number
>> of months having moved from unbound, since this time we see good, in
>> fact quick DNS response time but then when running 3.1.7.1 and .2
>> and also 3.2.1 we see random spikes up to 2 seconds for the response
>> times often at the quietest of times for the name servers.
>
> Versions below 3.2 can indeed sometimes show prolonged delays when running
> with large caches. This issue is solved in 3.2.
Understood
>
>> I had put this down to 3.1 version after reaading the changelog and
>> bugs fixed in 3.2 but having upgraded we still see the same spiking,
>> this time more frequent over night but not quite as severe as they
>> were.
>
> As discussed off-list, you see these spikes for a number of domain names, at
> least one of which has a short lived TTL and an unresponsive authoritative
> server.
>
> Mar 17 11:57:02 [5] bbc.co.uk.: Resolved 'bbc.co.uk.' NS ns1.bbc.co.uk. to: 132.185.132.21
> Mar 17 11:57:02 [5] bbc.co.uk.: Trying IP 132.185.132.21:53, asking 'bbc.co.uk.|A'
> Mar 17 11:57:04 [5] bbc.co.uk.: timeout resolving
It never used to happen before 23/12/09 though looking at our graphs and
as you say this happens for at least 5 domain names that we monitor
(some of which are our own and some external).
>
> bbc.co.uk has a 300 second ttl, and thus expires frequently.
>
>> I realise that outside lookups will influence the results but its
>> weird that when at their busiest they are more responsive than when
>> its quiet and also have most of the unusual behaviour at that time.
>
> When servers are busy, your monitoring system is unlikely to encounter
> expired TTLs. This is why a busy server in fact provides superior service
> compared to an idle one.
I did wonder about this and whether that would be the case.
>
>> Recursor performance graphing and dnsscope stats look OK although
>> the average time to respon goes up by 100% overnight, see sample
>> stats below from overnight/this morning :-
>
> This matches the expectations.
Doubling overnight and acceptable to have 2 second look up times?? This
is definitely not something that would be acceptable to our customers
for valid domains...
>
> Bert
Also, my previous post may appear to have been having a dig at the
support I have received off list from Bert or that this is entirely due
to the recursor software, far from it, I have been delighted with the
level of support received up to yet and really want to fix this issue
and stay with powerdns. I have sent our config as I wasn't confident
that we hadn't missed something as well.
I look forward to getting to the bottom of this and being a happy
powerdns user.
Simon
More information about the Pdns-users
mailing list