[Pdns-users] TCP listener hangs with fd error on recursor

bert hubert bert.hubert at netherlabs.nl
Mon Dec 21 13:04:01 UTC 2009


On Mon, Dec 21, 2009 at 11:30:25AM +0000, Josh Berry wrote:
> On Mon, Dec 21, 2009 at 10:07:18AM +0000, Josh Berry wrote:
> 
> > Can you tell us if you are behind a firewall? Perhaps iptables on the host
> > itself?
> 
> The server is not behind a firewall, it is behind a load balancer (Nortel
> Application Switch 2216 E) with a simple ACL (that just restricts access
> to our own IP ranges) in front of it.

Ok - unsure if that explains why nobody else is seeing this bug. 

Thanks to your high quality bug reporting (partially off-list)! I was able
to find & fix the issue.

If you are in a hurry, remove this line:
http://wiki.powerdns.com/trac/browser/tags/pdns-3.1.7.1/pdns/pdns_recursor.cc#L624
(the removeReadFD call).

Many thanks Josh!

What happened was that PowerDNS incorrectly tried to remove the connection
from the IO multiplexer in case of this (rare) error, but it was not part of
the multiplexer. The resulting 'exception' caused this connection not to be
removed from the number of open TCP sessions, leading PowerDNS to shut down
TCP service over time.

I was able to reproduce this bug exactly, so I'm confident we've nailed this
one.

	Bert


> 
> > Are you running with --fork?
> 
> No
> 
> >> Dec 19 16:10:21 pcl-cachedns03 pdns_recursor[21800]: Error writing TCP answer to 212.159.6.141: Connection reset by peer
> >> Dec 19 16:10:21 pcl-cachedns03 pdns_recursor[21800]: STL error: Tried to remove unlisted fd 410 from multiplexer
> 
> > Well, this is very useful debugging information. I can't see how this would
> > lead to TCP stopping working.
> 
> > When you observe that the TCP listener has died, do you get a connection
> > refused when you try to connect to TCP/53? Or a timeout? Or do you get a
> > connection but no answer?
> 
> The actual TCP socket stops responding. I am checking locally on the box using both 'dig +tcp...' and with a check that opens a TCP connection to port 53 and times it - both stopped responding. I'm afraid I can't provide any more detail than that until it happens again, at which point I could do some more diagnostics.
> 
> The log messages above always appear as the listener dies however.
> 
> Josh
> _______________________________________________
> Pdns-users mailing list
> Pdns-users at mailman.powerdns.com
> http://mailman.powerdns.com/mailman/listinfo/pdns-users
> 



More information about the Pdns-users mailing list