[Pdns-users] pdns-2.19.16 && confirmed crashes && solaris 9

Roderick Groesbeek powerdns at roderick.triple-it.nl
Wed Mar 10 22:43:35 UTC 2004


Ls,

Info:
===
As I had some free time today (you know.. attending conferences, no other
tasks for today :-)
I could finally deploy pdns_server on our second fallback location.
But I have confirmed crashes on this machine (platform Solaris 9).
Our other fallback location is running smoothly on Sol9 with the DUPmx patch
however.

So I did some investigation...
and found out that the sockAddrToString() is fed with some wrong parameter
data.



StackTrace:
========
(gdb) bt
#0  _Z16sockAddrToStringP11sockaddr_inj (remote=0x167136, socklen=16)
    at misc.cc:298
#1  0x8baec in _ZN8DNSProxy12getID_lockedEv (this=<incomplete type>)
    at /usr/local/include/c++/3.3.2/bits/stl_tree.h:202
#2  0x8b584 in _ZN8DNSProxy10sendPacketEP9DNSPacket (this=<incomplete type>,
    p=0x1e9c70) at dnsproxy.cc:108
#3  0x47864 in _ZN13PacketHandler8questionEP9DNSPacket (
    this=<incomplete type>, p=0x1e9c70)
    at /usr/local/include/c++/3.3.2/bits/stl_alloc.h:656
#4  0xf8974 in _ZN11DistributorI9DNSPacketS0_13PacketHandlerE10makeThreadEPv
( p=0x19a2e8) at distributor.hh:192
(gdb) p *remote
$12 = {sin_family = 2, sin_port = 49557, sin_addr = {S_un = {S_un_b = {
        s_b1 = 212 'Ô', s_b2 = 127 '\177', s_b3 = 254 'þ', s_b4 = 70 'F'},
      S_un_w = {s_w1 = 54399, s_w2 = 65094}, S_addr = 3565157958}},
  sin_zero = "\000\000\000\000\000\000\000"}
(gdb)


I had a hunch about the problem and built the following patch.


Patch (Just a hunch):
==============
~~
23:12:19 root at calypso:/usr/local/src> /usr/local/bin/diff -uBb
pdns-2.9.16.orig/pdns/dnsproxy.cc pdns-2.9.16/pdns/dnsproxy.cc
--- pdns-2.9.16.orig/pdns/dnsproxy.cc   2004-02-28 19:59:32.000000000 +0100
+++ pdns-2.9.16/pdns/dnsproxy.cc        2004-03-10 22:43:06.772871000 +0100
@@ -137,10 +137,12 @@
       return n;
     }
     else if(i->second.created<time(0)-60) {
-      if(i->second.created)
+/*[Unk] this crashes on some Solaris. Don't know why yet */
+      if(i->second.created) {
        L<<Logger::Warning<<"Recursive query for remote "<<
-         sockAddrToString((struct sockaddr_in *)&i->second.remote,
i->second.addrlen)<<" with internal id "<<n<<
+        (0 == 1 ? sockAddrToString((struct sockaddr_in *)&i->second.remote,
i->second.addrlen) : " (leeg)") <<" with internal id "<<n<<
          " was not answered by backend within timeout, reusing id"<<endl;
+       }

       return n;
     }
23:12:39 root at calypso:/usr/local/src>
~~

Now the pdns_server runs smoothly on my other Solaris box also.. where the
output instead of a crash becomes

Output:
=====
~~
Mar 10 22:55:48 Recursive query for remote  (leeg) with internal id 0 was
not answered by backend within timeout, reusing id
Mar 10 22:56:08 Received packet from recursor backend with id 0 which is a
duplicate
~~
[Note: Verbose logging!]

As it seems reusing an timeouted id 0, does not always has a right remote
member variabele.

At current I would say the remote member variabele is not correctly
initialized,
but as I am not too much in the threaded pdns_server code (yet), I would
love to here any tips on how to followup,
what to test, try, fix, etc.



Vriendelijke Groet,

Roderick
--
Pettemerstraat 12A                                  T r I p l e
1823 CW Alkmaar                                         T
Tel. +31 (0)72-5129516
fax. +31 (0)72-5129520                              Automatisering
www.triple-it.nl                                 "Laat uw Net Werken!"



More information about the Pdns-users mailing list