[Pdns-users] Slave AXFR not working 100% at high rates

Martijn Grendelman martijn at pocos.nl
Tue Jul 18 15:38:58 UTC 2006


Hi,

First off, I apologize for the length of this message.

I run three PowerDNS servers, one master and two slaves, all with their 
own MySQL backend. The slaves know the master as 'supermaster' and this 
appears to work.

Yesterday, I added a whole bunch (about 250) of domains to the master, 
and the slaves were notified for each of them. However, the AXFR of the 
domains, didn't work entirely as it should have, and I can't quite 
explain what I see here:

On the master:

Jul 17 17:02:12 ilsia051 pdns[2152]: Queued notification of domain 
'alkmaarrulez.nl' to 62.69.177.12
Jul 17 17:02:12 ilsia051 pdns[2152]: Queued notification of domain 
'alkmaarrulez.nl' to 62.69.184.65
Jul 17 17:02:12 ilsia051 pdns[2152]: Queued notification of domain 
'alkmaarrulez.nl' to 80.79.42.110
Jul 17 17:02:24 ilsia051 pdns[2152]: Received unsuccesful notification 
report for 'alkmaarrulez.nl' from 62.69.177.12, rcode: 4


On one of the slaves::

Jul 17 17:02:23 ilsia251 pdns[19405]: Received NOTIFY for 
alkmaarrulez.nl from 62.69.177.12 for which we are not authoritative
Jul 17 17:02:25 ilsia251 pdns[19405]: Created new slave zone 
'alkmaarrulez.nl' from supermaster 62.69.177.12, queued axfr
Jul 17 17:02:25 ilsia251 pdns[949]: gmysql Connection succesful
Jul 17 17:02:25 ilsia251 pdns[949]: AXFR started for 'alkmaarrulez.nl', 
transaction started
Jul 17 17:02:25 ilsia251 pdns[949]: AXFR done for 'alkmaarrulez.nl', 
zone committed

It's the last line in the log file on the master that I don't 
understand. What does 'rcode: 4' mean here, I mean more than "not 
implemented" ?

The problem is, that the zone transfers really don't work 100%. In this 
particular case, all the new domains were queued for notification at a 
really high rate. It seems like either the slave or the master just 
couldn't keep up. In the end, I was just missing records.

Today, I updated the serials for _all_ 1589 domains on the master, and 
another "notify storm" was triggered (well, that's what I wanted). After 
everything was calm again, I looked at the number of records in the 
databases:

          domains     records
master    1589        22822
slave1    1589        20770
slave2    1589        20780

Still missing records!

A log extract regarding this action:

On the master:

Jul 18 16:27:02 ilsia051 pdns[32722]: AXFR of domain 
'wageningenrulez.nl' initiated by 62.69.184.65
Jul 18 16:27:02 ilsia051 pdns[32722]: gmysql Connection succesful
Jul 18 16:27:02 ilsia051 pdns[32722]: AXFR of domain 
'wageningenrulez.nl' to 62.69.184.65 finished
Jul 18 16:27:03 ilsia051 pdns[2152]: Received unsuccesful notification 
report for 'wageningenrulez.nl' from 62.69.177.12, rcode: 4

On the slave:

Jul 18 16:27:02 ilsia251 pdns[949]: AXFR started for 
'wageningenrulez.nl', transaction started
Jul 18 16:27:02 ilsia251 pdns[949]: AXFR done for 'wageningenrulez.nl', 
zone committed

...and nothing else.

Now, with a script, I explicitly queued notifies for all domains (with 
pdns_control) at 1 second intervals. A lot of domains on the slaves were 
up to date, but also a lot of domains were not and AXFRs were started 
for those.

After this had finished, the records count on all three servers was 
22822, as it should have been a long time ago.

Can anyone shed some light on this?

Best regards,

Martijn Grendelman

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/x-pkcs7-signature
Size: 3233 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://mailman.powerdns.com/pipermail/pdns-users/attachments/20060718/2f963b6d/attachment.bin>


More information about the Pdns-users mailing list