[Pdns-users] pdns-2.19.16 && confirmed crashes && solaris 9
Roderick Groesbeek
powerdns at roderick.triple-it.nl
Wed Mar 10 22:43:35 UTC 2004
Ls,
Info:
===
As I had some free time today (you know.. attending conferences, no other
tasks for today :-)
I could finally deploy pdns_server on our second fallback location.
But I have confirmed crashes on this machine (platform Solaris 9).
Our other fallback location is running smoothly on Sol9 with the DUPmx patch
however.
So I did some investigation...
and found out that the sockAddrToString() is fed with some wrong parameter
data.
StackTrace:
========
(gdb) bt
#0 _Z16sockAddrToStringP11sockaddr_inj (remote=0x167136, socklen=16)
at misc.cc:298
#1 0x8baec in _ZN8DNSProxy12getID_lockedEv (this=<incomplete type>)
at /usr/local/include/c++/3.3.2/bits/stl_tree.h:202
#2 0x8b584 in _ZN8DNSProxy10sendPacketEP9DNSPacket (this=<incomplete type>,
p=0x1e9c70) at dnsproxy.cc:108
#3 0x47864 in _ZN13PacketHandler8questionEP9DNSPacket (
this=<incomplete type>, p=0x1e9c70)
at /usr/local/include/c++/3.3.2/bits/stl_alloc.h:656
#4 0xf8974 in _ZN11DistributorI9DNSPacketS0_13PacketHandlerE10makeThreadEPv
( p=0x19a2e8) at distributor.hh:192
(gdb) p *remote
$12 = {sin_family = 2, sin_port = 49557, sin_addr = {S_un = {S_un_b = {
s_b1 = 212 'ÃÂ', s_b2 = 127 '\177', s_b3 = 254 'þ', s_b4 = 70 'F'},
S_un_w = {s_w1 = 54399, s_w2 = 65094}, S_addr = 3565157958}},
sin_zero = "\000\000\000\000\000\000\000"}
(gdb)
I had a hunch about the problem and built the following patch.
Patch (Just a hunch):
==============
~~
23:12:19 root at calypso:/usr/local/src> /usr/local/bin/diff -uBb
pdns-2.9.16.orig/pdns/dnsproxy.cc pdns-2.9.16/pdns/dnsproxy.cc
--- pdns-2.9.16.orig/pdns/dnsproxy.cc 2004-02-28 19:59:32.000000000 +0100
+++ pdns-2.9.16/pdns/dnsproxy.cc 2004-03-10 22:43:06.772871000 +0100
@@ -137,10 +137,12 @@
return n;
}
else if(i->second.created<time(0)-60) {
- if(i->second.created)
+/*[Unk] this crashes on some Solaris. Don't know why yet */
+ if(i->second.created) {
L<<Logger::Warning<<"Recursive query for remote "<<
- sockAddrToString((struct sockaddr_in *)&i->second.remote,
i->second.addrlen)<<" with internal id "<<n<<
+ (0 == 1 ? sockAddrToString((struct sockaddr_in *)&i->second.remote,
i->second.addrlen) : " (leeg)") <<" with internal id "<<n<<
" was not answered by backend within timeout, reusing id"<<endl;
+ }
return n;
}
23:12:39 root at calypso:/usr/local/src>
~~
Now the pdns_server runs smoothly on my other Solaris box also.. where the
output instead of a crash becomes
Output:
=====
~~
Mar 10 22:55:48 Recursive query for remote (leeg) with internal id 0 was
not answered by backend within timeout, reusing id
Mar 10 22:56:08 Received packet from recursor backend with id 0 which is a
duplicate
~~
[Note: Verbose logging!]
As it seems reusing an timeouted id 0, does not always has a right remote
member variabele.
At current I would say the remote member variabele is not correctly
initialized,
but as I am not too much in the threaded pdns_server code (yet), I would
love to here any tips on how to followup,
what to test, try, fix, etc.
Vriendelijke Groet,
Roderick
--
Pettemerstraat 12A T r I p l e
1823 CW Alkmaar T
Tel. +31 (0)72-5129516
fax. +31 (0)72-5129520 Automatisering
www.triple-it.nl "Laat uw Net Werken!"
More information about the Pdns-users
mailing list