[Pdns-users] What does "had X timeouts" actually mean?
andy at strugglers.net
Sun Mar 30 16:38:46 UTC 2014
I'm trying to debug a weird problem one of my users is having.
I am providing secondary DNS service for a number of my user's
domains, which they have primary DNS for on one of their own
machines. They quite regularly update all of their domains at once
and thus send out a flood of notifies to me.
In the usual case, one of my servers (a.authns; BIND 9) does an AXFR
from them and then sends out its own notifies to a fixed list of
addresses which cause my two other servers (b.authns, c.authns;
PDNS) to kick off an AXFR from a.authns.
That's been working fine for years, but just recently one of my PDNS
servers Ã¢ÂÂ c.authns Ã¢ÂÂ has been seemingly ignoring notifies for one of
the user's domains Ã¢ÂÂ always the same domain.
So what happens is:
- User updates their domains
- My BIND 9 server takes AXFR of them all
- My PDNS servers take AXFR of them all except for that one domain
- User wonders what is going on with that one domain and tries
various things like extra updates, all to no avail, before opening
a support ticket with me
- The zone freshness timer later expires and c.authns realises it
needs to do an AXFR and does so without incident.
This has happened several times in the last few days now.
The notifies are definitely being received as I have verified this
with dnscap on c.authns itself. The only thing I see in the pdns
logs at the same time the notifies are received is:
Mar 30 14:03:35 cardhu pdns: 3 slave domains need checking, 0 queued for AXFR
Mar 30 14:03:38 cardhu pdns: Received serial number updates for 2 zones, had 1 timeouts
Mar 30 14:13:56 cardhu pdns: 1 slave domain needs checking, 0 queued for AXFR
Mar 30 14:13:59 cardhu pdns: Received serial number updates for 0 zones, had 1 timeouts
At 14:03 the problem domain was one of the ones being updated; the
others went fine. At 14:13 is the user doing another update of just
the problem domain.
So, I wonder, what is the actual technical meaning of "had X
Naturally I would normally suspect connectivity difficulties between
a.authns and c.authns but the fact is that this is always the one
domain out of a bunch of others all hosted on the same primary
server, so why always timeouts for that one?
More information about the Pdns-users