[Pdns-users] Odd Recursor problems

Peter van Dijk peter.van.dijk at netherlabs.nl
Wed Feb 1 12:42:18 UTC 2012

Hello Jeremy,

On Jan 20, 2012, at 20:51 , Jeremy Utley wrote:

> We're having some odd intermittent problems with our recursor which I'm not sure if I should be concerned or not about them.  It seems that
> intermittently when we query our recursors for a CNAME record, we're not getting a proper response.  I am going to be detailed about the problem,
> so this will be a long message, and I apologize in advance for that.  However, I've about reached my wits end with trying to diagnose this issue.
> [..]
> Right now, I am working under the thought that occasionally, the recursor does not get a timely response from the Edgecast/Level3 authoritative
> servers, and is therefore failing.  However, it does seem odd that I wouldnt' see the problem with our standalone BIND servers.  One other thing
> I have done for testing is to disable load-balanced traffic to one of our 6 nameservers, and turned on the recursor trace mode on that nameserver.
> However, even with only a few checks every minute addressed to it, piecing together the trace logs is still not real easy.
> Does anyone else have any thoughts on this?
> Thanks for any assistance you can give me!

The issue in itself does not look familiar to me. The most likely explanation would indeed be the auths failing; depending on how they are failing, the difference in behaviour compared to BIND may not be that odd, every recursor treats different failure modes in different ways.

If you could try to piece together trace logs (noting the failure timestamps from your monitoring, and making sure everything is NTP-synced, should not make that _too_ hard), that would really help. How are the TTLs on the CNAME and the underlying A? I'm sorry I don't have anything concrete for you right now.

Kind regards,
Peter van Dijk

