[Pdns-users] Issue with communications hanging, version 2.9.17-13sarge1

Dave Taylor davetaylor at frontiernet.net
Thu Oct 20 20:35:29 UTC 2005


 
Following is more information that may be relevant to the issue that we are
experiencing.

The process that checks for slave domains that need to be refreshed
(CommunicatorClass::slaveRefresh()) seems to just stop working after a
certain period of time.

An strace -p on this process shows the following over and over when working:

recvfrom(11, 0xbf5ff364, 1500, 0, 0xbf5ff944, 0xbf5ff204) = -1 EAGAIN
(Resource temporarily unavailable)
time(NULL)                              = 1129831123
time(NULL)                              = 1129831123
rt_sigprocmask(SIG_BLOCK, [CHLD], [RTMIN], 8) = 0
rt_sigaction(SIGCHLD, NULL, {SIG_DFL}, 8) = 0
rt_sigprocmask(SIG_SETMASK, [RTMIN], NULL, 8) = 0
nanosleep({1, 0}, {1, 0})

But only shows this when it has stopped working (1 time, not over and over):

recvfrom(11, 
(the file descriptor for socket 11 shows: "pdns_serv 19540    pdns   11u
IPv4     912515                   UDP *:10006")

And that is it.  I can still force a retrieve with pdns_control and the
command will return, but nothing happens.  


When it's working properly, pdns will 
1) make it's connection to the mysql db
2) make a connection to the master server of the slave zone (as shown here
in the strace)

socket(PF_INET, SOCK_STREAM, IPPROTO_IP) = 16
fcntl64(16, F_GETFL)                    = 0x2 (flags O_RDWR)
fcntl64(16, F_SETFL, O_RDWR|O_NONBLOCK) = 0
connect(16, {sa_family=AF_INET, sin_port=htons(53),
sin_addr=inet_addr("x.x.x.x")}, 16) = -1 EINPROGRESS (Operation now in
progress)
select(17, [16], [16], NULL, {10, 0})   = 1 (out [16], left {10, 0})
getsockopt(16, SOL_SOCKET, SO_ERROR, [0], [4]) = 0
fcntl64(16, F_GETFL)                    = 0x802 (flags O_RDWR|O_NONBLOCK)
fcntl64(16, F_SETFL, O_RDWR)            = 0
writev(16, [{"\0\33", 2},
{"\16\351\0\0\0\1\0\0\0\0\0\0\00510nbc\3com\0\0\374\0\1", 27}], 2) = 29
 * 2 bytes in buffer 0
 | 00000  00 1b                                             ..
|
 * 27 bytes in buffer 1
 | 00000  0e e9 00 00 00 01 00 00  00 00 00 00 05 31 30 6e  ........
.....10n |
 | 00010  62 63 03 63 6f 6d 00 00  fc 00 01                 bc.com.. ...
|
select(17, [16], NULL, NULL, {10, 0})   = 1 (in [16], left {9, 990000})

3) query the local DB for the zone information.
4) query the master for it's information.
5) compare info and update as needed.

When this isn't working, step 2 above is not happening and therefore steps 4
and 5 never happen.  It simply shuts down the mysql connection and closes.

I realize that this is what is happening after it has already stopped
working.  I have not been able to pinpoint what might be making it stop
working.

Is there other information that may be helpful in figuring this out?  I
would be glad to gather more info, but I'm not 100% sure of where to go from
here.



More information about the Pdns-users mailing list