[Pdns-users] Slave AXFR not working 100% at high rates

Martijn Grendelman martijn at pocos.nl
Tue Jul 18 15:51:01 UTC 2006


Damn.

Re-reading my own message, I solved part of the problem, but not all.

> First off, I apologize for the length of this message.
> 
> I run three PowerDNS servers, one master and two slaves, all with their 
> own MySQL backend. The slaves know the master as 'supermaster' and this 
> appears to work.
> 
> Yesterday, I added a whole bunch (about 250) of domains to the master, 
> and the slaves were notified for each of them. However, the AXFR of the 
> domains, didn't work entirely as it should have, and I can't quite 
> explain what I see here:
> 
> On the master:
> 
> Jul 17 17:02:12 ilsia051 pdns[2152]: Queued notification of domain 
> 'alkmaarrulez.nl' to 62.69.177.12
> Jul 17 17:02:12 ilsia051 pdns[2152]: Queued notification of domain 
> 'alkmaarrulez.nl' to 62.69.184.65
> Jul 17 17:02:12 ilsia051 pdns[2152]: Queued notification of domain 
> 'alkmaarrulez.nl' to 80.79.42.110
> Jul 17 17:02:24 ilsia051 pdns[2152]: Received unsuccesful notification 
> report for 'alkmaarrulez.nl' from 62.69.177.12, rcode: 4

I realize now what this is.

62.69.177.12 is the public IP of the master. It gets notified too, 
because the machine itself doesn't know its own public IP. That 
clarifies the rcode 4. It's simply the master telling itself that slave 
support is disabled.

That brings me to another question: is it possible to explicitly tell 
PowerDNS _not_ to notify a certain IP, even if it is listed as a NS 
record for one or more zones?

> On one of the slaves::
> 
> Jul 17 17:02:23 ilsia251 pdns[19405]: Received NOTIFY for 
> alkmaarrulez.nl from 62.69.177.12 for which we are not authoritative
> Jul 17 17:02:25 ilsia251 pdns[19405]: Created new slave zone 
> 'alkmaarrulez.nl' from supermaster 62.69.177.12, queued axfr
> Jul 17 17:02:25 ilsia251 pdns[949]: gmysql Connection succesful
> Jul 17 17:02:25 ilsia251 pdns[949]: AXFR started for 'alkmaarrulez.nl', 
> transaction started
> Jul 17 17:02:25 ilsia251 pdns[949]: AXFR done for 'alkmaarrulez.nl', 
> zone committed
> 
> It's the last line in the log file on the master that I don't 
> understand. What does 'rcode: 4' mean here, I mean more than "not 
> implemented" ?

So, this actually looks all good. The problem persists, however...

> The problem is, that the zone transfers really don't work 100%. In this 
> particular case, all the new domains were queued for notification at a 
> really high rate. It seems like either the slave or the master just 
> couldn't keep up. In the end, I was just missing records.
> 
> Today, I updated the serials for _all_ 1589 domains on the master, and 
> another "notify storm" was triggered (well, that's what I wanted). After 
> everything was calm again, I looked at the number of records in the 
> databases:
> 
>          domains     records
> master    1589        22822
> slave1    1589        20770
> slave2    1589        20780
> 
> Still missing records!
> 
> A log extract regarding this action:
> 
> On the master:
> 
> Jul 18 16:27:02 ilsia051 pdns[32722]: AXFR of domain 
> 'wageningenrulez.nl' initiated by 62.69.184.65
> Jul 18 16:27:02 ilsia051 pdns[32722]: gmysql Connection succesful
> Jul 18 16:27:02 ilsia051 pdns[32722]: AXFR of domain 
> 'wageningenrulez.nl' to 62.69.184.65 finished
> Jul 18 16:27:03 ilsia051 pdns[2152]: Received unsuccesful notification 
> report for 'wageningenrulez.nl' from 62.69.177.12, rcode: 4

Again, same problem as above.

> 
> On the slave:
> 
> Jul 18 16:27:02 ilsia251 pdns[949]: AXFR started for 
> 'wageningenrulez.nl', transaction started
> Jul 18 16:27:02 ilsia251 pdns[949]: AXFR done for 'wageningenrulez.nl', 
> zone committed
> 
> ...and nothing else.

All good, again.

> Now, with a script, I explicitly queued notifies for all domains (with 
> pdns_control) at 1 second intervals. A lot of domains on the slaves were 
> up to date, but also a lot of domains were not and AXFRs were started 
> for those.
> 
> After this had finished, the records count on all three servers was 
> 22822, as it should have been a long time ago.
> 
> Can anyone shed some light on this?

So, to summarize the problem:

At high rates of notifies and AXFRs, not all transfers seem to come 
across 100%. In the end, the slaves have less records than the master.

If notifies are sent out at a slow rate, one domain a second, things 
seem to straighten out.

Best regards,

Martijn Grendelman
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/x-pkcs7-signature
Size: 3233 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://mailman.powerdns.com/pipermail/pdns-users/attachments/20060718/d1851baf/attachment.bin>


More information about the Pdns-users mailing list