Nonsstop problems Re: [Pdns-users] migrating from bind

Craig Sanders cas at taz.net.au
Sun Jun 22 00:18:23 UTC 2003


On Sat, Jun 14, 2003 at 05:16:27PM -0400, Steven J. Sobol wrote:
> On Sat, 14 Jun 2003, Craig Sanders wrote:
> 
> > please CC: any replies to me as i am not yet subscribed to this list.
> > 
> > i'm seriously considering migrating at least some of my DNS servers from
> > bind to powerdns.
> 
> I did, with the intent of using the Bind backend and eventually migrating 
> to the SQL backend.
> 
> I have had no end of trouble with PDNS just simply deciding not to serve 
> certain zones, and it's gotten to the point where I'm ready to go back to 
> Bind even though that means having to install security updates every other 
> month.

what problems were you having with certain zones?

i'm finding that small zones (of up to, say, a few hundred resource records)
are working fine in either the bind or pgsql backend.

i'm having enormous difficulties with huge zone files that have millions of
resource records (e.g. local mirrors of relays.osirusoft.com,
blackholes.easynet.nl, and dynablock.easynet.nl).  aside from the regular
security issues with bind, these huge zones (and the RAM they eat up) are the
main reason i want to switch from bind to powerdns.

what i'm seeing is:

1. secondarying a huge zone from a bind server to pdns:

it's almost impossible to achieve this.  pdns times out during most attempts,
although occasionally it succeeds.

host X.X.X.X = bind server[1]
host Y.Y.Y.Y = pdns server[2]

Jun 22 01:13:11 build pdns[27926]: Domain dynablock.easynet.nl is stale, master serial 2003062101, our serial 2003062000
Jun 22 01:13:11 build pdns[27926]: Domain blackholes.easynet.nl is stale, master serial 2003062115, our serial 2003062101
Jun 22 01:13:17 build pdns[27926]: AXFR started for 'dynablock.easynet.nl', transaction started
Jun 22 02:59:41 build pdns[27926]: AXFR done for 'dynablock.easynet.nl', zone committed
Jun 22 02:59:43 build pdns[27926]: Domain blackholes.easynet.nl is stale, master serial 2003062117, our serial 2003062101
Jun 22 02:59:49 build pdns[27926]: AXFR started for 'blackholes.easynet.nl', transaction started
Jun 22 03:06:04 build pdns[27926]: Unable to AXFR zone 'blackholes.easynet.nl': Remote nameserver closed TCP connection
Jun 22 03:06:04 build pdns[27926]: Aborting possible open transaction for domain 'blackholes.easynet.nl' AXFR
Jun 22 03:06:06 build pdns[27926]: Error trying to retrieve/refresh 'blackholes.easynet.nl': Timeout waiting for answer from X.X.X.X
Jun 22 03:06:16 build pdns[27926]: Unable to AXFR zone 'blackholes.easynet.nl': Timeout waiting for answer from X.X.X.X during AXFR
Jun 22 03:07:06 build pdns[27926]: Domain blackholes.easynet.nl is stale, master serial 2003062118, our serial 2003062101
Jun 22 03:07:07 build pdns[27926]: AXFR started for 'blackholes.easynet.nl', transaction started
Jun 22 03:16:12 build pdns[27926]: Unable to AXFR zone 'blackholes.easynet.nl': Reading data from remote nameserver over TCP: Connection timed out
Jun 22 03:16:12 build pdns[27926]: Aborting possible open transaction for domain 'blackholes.easynet.nl' AXFR
Jun 22 06:30:50 build pdns[27926]: Domain dynablock.easynet.nl is fresh
Jun 22 06:31:02 build pdns[27926]: Domain blackholes.easynet.nl is stale, master serial 2003062121, our serial 2003062101
Jun 22 06:31:13 build pdns[27926]: Unable to AXFR zone 'blackholes.easynet.nl': Timeout waiting for answer from X.X.X.X during AXFR

the bind side of things doesn't indicate any problem - as far as it is concerned, the AXFR went smoothly.

Jun 22 01:13:17 taz named[5620]: approved AXFR from [Y.Y.Y.Y].51761 for "dynablock.easynet.nl"
Jun 22 01:13:17 taz named[5620]: zone transfer (AXFR) of "dynablock.easynet.nl" (IN) to [Y.Y.Y.Y].51761 serial 2003062101
Jun 22 02:59:48 taz named[5620]: approved AXFR from [Y.Y.Y.Y].51763 for "blackholes.easynet.nl"
Jun 22 02:59:48 taz named[5620]: zone transfer (AXFR) of "blackholes.easynet.nl" (IN) to [Y.Y.Y.Y].51763 serial 2003062117
Jun 22 03:06:53 taz named[5620]: approved AXFR from [Y.Y.Y.Y].51765 for "blackholes.easynet.nl"
Jun 22 03:06:53 taz named[5620]: zone transfer (AXFR) of "blackholes.easynet.nl" (IN) to [Y.Y.Y.Y].51765 serial 2003062118
Jun 22 03:07:06 taz named[5620]: approved AXFR from [Y.Y.Y.Y].51767 for "blackholes.easynet.nl"
Jun 22 03:07:06 taz named[5620]: zone transfer (AXFR) of "blackholes.easynet.nl" (IN) to [Y.Y.Y.Y].51767 serial 2003062118
Jun 22 06:31:03 taz named[5620]: approved AXFR from [Y.Y.Y.Y].51769 for "blackholes.easynet.nl"
Jun 22 06:31:03 taz named[5620]: zone transfer (AXFR) of "blackholes.easynet.nl" (IN) to [Y.Y.Y.Y].51769 serial 2003062121
Jun 22 06:32:05 taz named[5620]: approved AXFR from [Y.Y.Y.Y].51771 for "blackholes.easynet.nl"
Jun 22 06:32:05 taz named[5620]: zone transfer (AXFR) of "blackholes.easynet.nl" (IN) to [Y.Y.Y.Y].51771 serial 2003062121


2. secondarying the same zones from a pdns server to pdns
 
just to test whether the problem is in the pdns secondary or the overworked
bind "master", i set up pdns on another machine[3] to secondary the same zones
from the first pdns server.

(for these tests, i increased max-queue-length in the new pdns server to 10000
from the default of 5000. that seems to have changed it from timing out after 5
seconds to timing out after 10 seconds).

Jun 22 09:34:26 csanders pdns[5901]: 2 slave domains need checking
Jun 22 09:34:26 csanders pdns[5901]: Domain dynablock.easynet.nl is stale, master serial 2003062101, our serial 0
Jun 22 09:34:26 csanders pdns[5901]: Domain blackholes.easynet.nl is stale, master serial 2003062101, our serial 0
Jun 22 09:34:26 csanders pdns[5901]: gpgsql Connection succesful
Jun 22 09:34:36 csanders pdns[5901]: Unable to AXFR zone 'dynablock.easynet.nl': Timeout waiting for answer from Y.Y.Y.Y during AXFR
Jun 22 09:34:36 csanders pdns[5901]: gpgsql Connection succesful
Jun 22 09:34:46 csanders pdns[5901]: Unable to AXFR zone 'blackholes.easynet.nl': Timeout waiting for answer from Y.Y.Y.Y during AXFR
Jun 22 09:35:26 csanders pdns[5901]: 2 slave domains need checking
Jun 22 09:35:26 csanders pdns[5901]: Domain dynablock.easynet.nl is stale, master serial 2003062101, our serial 0
Jun 22 09:35:26 csanders pdns[5901]: Domain blackholes.easynet.nl is stale, master serial 2003062101, our serial 0
Jun 22 09:35:26 csanders pdns[5901]: gpgsql Connection succesful
Jun 22 09:35:36 csanders pdns[5901]: Unable to AXFR zone 'dynablock.easynet.nl': Timeout waiting for answer from Y.Y.Y.Y during AXFR
Jun 22 09:35:36 csanders pdns[5901]: gpgsql Connection succesful
Jun 22 09:35:46 csanders pdns[5901]: Unable to AXFR zone 'blackholes.easynet.nl': Timeout waiting for answer from Y.Y.Y.Y during AXFR
Jun 22 09:36:24 csanders pdns[5829]: Scheduling exit on remote request



it looks to me as if pdns is just aborting the XFR if it doesn't get the entire
zone within 10 seconds.  for a huge zone, that's just plain impossible.  even
secondarying a small zone from a busy server with thousands of domains (and
millions of records) in an SQL server is likely to be problematic simply
because of the time it takes to select all records for a zone.





hardware and software details:

[1] host X.X.X.X = celeron-366 with 512MB and slow, old disks.  my personal
    mail, web, dns, and everything server.  co-located in the racks where i
    work.

[2] host Y.Y.Y.Y = dual P3-933 with 1GB RAM and fast scsi disks on a mylex 352
    raid card.  this machine is basically idle, i'm using it for testing pdns
    before it gets commissioned as a new web server.

[3] celeron 1.7GHz with 256MB and new IDE drive.  my workstation at work.

all machines are running up-to-date (or nearly so) debian unstable, with recent
versions of the linux kernel.  bind on X.X.X.X is 8.3.3-3, pdns on both
machines is 2.9.8-1.  postgres is 7.3.3-1 on host Y.Y.Y.Y, and 7.3.2r1-1 on my
workstation.


craig



More information about the Pdns-users mailing list