[Pdns-users] PDNS-Recursor Segfaults

Aki Tuomi cmouse at youzen.ext.b2.fi
Tue May 20 21:40:37 UTC 2014


Thank you, this is pretty much what happens. Just need to figure
out why it crashes. No obvious reason stands out, other than why
it's using make_request for numeric host is not something I 
understand, and why it happens if you increase the amount of 
local addresses. 

is the recursor used by localhost? does anything change if you
set query-local-address to some IP address? 

Aki

On Wed, May 21, 2014 at 12:16:22AM +0300, Imre Gergely wrote:
> 
> backtrace attached.
> 
> [root at c605 pdns-recursor]# ulimit -a
> core file size          (blocks, -c) 0
> data seg size           (kbytes, -d) unlimited
> scheduling priority             (-e) 0
> file size               (blocks, -f) unlimited
> pending signals                 (-i) 3873
> max locked memory       (kbytes, -l) 64
> max memory size         (kbytes, -m) unlimited
> open files                      (-n) 8192
> pipe size            (512 bytes, -p) 8
> POSIX message queues     (bytes, -q) 819200
> real-time priority              (-r) 0
> stack size              (kbytes, -s) 10240
> cpu time               (seconds, -t) unlimited
> max user processes              (-u) 3873
> virtual memory          (kbytes, -v) unlimited
> file locks                      (-x) unlimited
> 
> 
> On 05/21/2014 12:00 AM, Aki Tuomi wrote:
> > Can you install the debuginfo package and run it with gdb to get stack trace? Also, can 
> > you give us ulimit -a?
> >
> > Aki
> >
> > On Tue, May 20, 2014 at 11:41:21PM +0300, Imre Gergely wrote:
> >> [root at c605 ~]# cat /etc/pdns-recursor/recursor.conf |grep -v "^#" | grep
> >> -v "^$"
> >> setuid=pdns-recursor
> >> setgid=pdns-recursor
> >> daemon=no
> >> local-address=127.0.0.1
> >> threads=1
> >>
> >> [root at c605 ~]# strace /usr/sbin/pdns_recursor >
> >> /tmp/strace-pdns-recursor.txt 2>&1
> >> Segmentation fault
> >> [root at c605 ~]#
> >>
> >> [root at c605 ~]# ip a |grep inet |wc
> >>    4576   27451  233679
> >>
> >> Attached. If this is not what you had in mind, please let me know.
> >>
> >> On 05/20/2014 11:31 PM, bert hubert wrote:
> >>> Imre,
> >>>
> >>> Can you strace the startup with threads=1?
> >>>
> >>>     Bert
> >>>
> >>> On May 20, 2014 10:25 PM, Imre Gergely <gimre at narancs.net> wrote:
> >>>> Hi
> >>>>
> >>>> I did manage to reproduce this in a VM. Installed a CentOS 6.5, and recursor 3.5.3 from EPEL. Then I did this:
> >>>>
> >>>> for i in `seq 1 16`; do for j in `seq 1 254`; do ip a a 10.0.$i.$j/16 dev eth0; done; done
> >>>>
> >>>> Then I started the recursor, everything went just fine, did a bunch of digs, no problems.
> >>>>
> >>>> Then I added some more IPs:
> >>>>
> >>>> for i in `seq 17 32`; do for j in `seq 1 254`; do ip a a 10.0.$i.$j/16 dev eth0; done; done
> >>>>
> >>>> And then init.d/pdns-recursor restart:
> >>>>
> >>>> May 20 23:18:24 c605 pdns_recursor[21341]: PowerDNS recursor 3.5.3 (C) 2001-2013 PowerDNS.COM BV (Feb 10 2014, 17:26:52, gcc 4.4.7 20120313 (Red Hat 4.4.7-4))
> >>>>  starting up
> >>>> May 20 23:18:24 c605 pdns_recursor[21341]: PowerDNS comes with ABSOLUTELY NO WARRANTY. This is free software, and you are welcome to redistribute it according to the terms of the GPL version 2.
> >>>> May 20 23:18:24 c605 pdns_recursor[21341]: Operating in 32 bits mode
> >>>> May 20 23:18:24 c605 pdns_recursor[21341]: Reading random entropy from '/dev/urandom'
> >>>> May 20 23:18:24 c605 pdns_recursor[21341]: Only allowing queries from: 127.0.0.0/8, 10.0.0.0/8, 100.64.0.0/10, 169.254.0.0/16, 192.168.0.0/16, 172.16.0.0/12, ::1/128, fe80::/10
> >>>> May 20 23:18:24 c605 pdns_recursor[21341]: Will not send queries to: 127.0.0.0/8, 10.0.0.0/8, 100.64.0.0/10, 169.254.0.0/16, 192.168.0.0/16, 172.16.0.0/12, ::1/128, fe80::/10, 0.0.0.0, ::
> >>>> May 20 23:18:24 c605 pdns_recursor[21341]: NOT using IPv6 for outgoing queries - set 'query-local-address6=::' to enable
> >>>> May 20 23:18:24 c605 pdns_recursor[21341]: Inserting rfc 1918 private space zones
> >>>> May 20 23:18:24 c605 pdns_recursor[21341]: Listening for UDP queries on 127.0.0.1:53
> >>>> May 20 23:18:24 c605 pdns_recursor[21341]: Enabled TCP data-ready filter for (slight) DoS protection
> >>>> May 20 23:18:24 c605 pdns_recursor[21341]: Listening for TCP queries on 127.0.0.1:53
> >>>> May 20 23:18:24 c605 pdns_recursor[21341]: Calling daemonize, going to background
> >>>> May 20 23:18:24 c605 pdns_recursor[21342]: Set effective group id to 499
> >>>> May 20 23:18:24 c605 pdns_recursor[21342]: Set effective user id to 498
> >>>> May 20 23:18:24 c605 pdns_recursor[21342]: Launching 2 threads
> >>>> May 20 23:18:24 c605 pdns_recursor[21342]: Done priming cache with root hints
> >>>> May 20 23:18:24 c605 pdns_recursor[21342]: Enabled 'epoll' multiplexer
> >>>> May 20 23:18:24 c605 pdns_recursor[21342]: Done priming cache with root hints
> >>>> May 20 23:18:24 c605 pdns_recursor[21342]: Refreshed . records
> >>>> May 20 23:18:25 c605 kernel: pdns_recursor[21345]: segfault at ffff01d4 ip 080b1626 sp b6397890 error 4 in pdns_recursor[8048000+112000]
> >>>>
> >>>> [root at c605 ~]# ip a |grep inet |wc
> >>>>    4322   25927  220579
> >>>> [root at c605 ~]# /etc/init.d/pdns-recursor start
> >>>> Starting pdns-recursor:                                    [  OK  ]  <-- starts OK
> >>>> [root at c605 ~]# /etc/init.d/pdns-recursor stop
> >>>> Stopping pdns-recursor:                                    [  OK  ]
> >>>>
> >>>> Adding one more /24:
> >>>>
> >>>> [root at c605 ~]# for j in `seq 1 254`; do ip a a 10.0.18.$j/16 dev eth0; done
> >>>> [root at c605 ~]# /etc/init.d/pdns-recursor start
> >>>> Starting pdns-recursor:                                    [  OK  ]
> >>>> [root at c605 ~]# ip a |grep inet |wc
> >>>>    4576   27451  233679
> >>>>
> >>>> It says it starts, but it doesn't, just segfaults.
> >>>>
> >>>> [root at c605 ~]# file /bin/bash
> >>>> /bin/bash: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.18, stripped
> >>>>
> >>>>
> >>>> On 05/20/2014 10:58 PM, James Baer wrote:
> >>>>> Hi All - I'm experiencing an issue that I am unsure if it is a bug or just something I need to adjust on my systems to account for. 
> >>>>>
> >>>>> I have 2 servers, both running pdns_recursor (3.5.3) on Centos 6.5, installed from epel repository. The recursor is only listening on localhost on each system. 
> >>>>>
> >>>>> I am experiencing somewhat random crashes of the recursor with the following error: 
> >>>>>
> >>>>> kernel: pdns_recursor[21993]: segfault at 200001fc8 ip 0000000000472780 sp 00007f3f9c03f690 error 4 in pdns_recursor[400000+111000] 
> >>>>>
> >>>>> Both servers have a large number of ip addresses bound to them, in the range of 3-4k. I was able to replicate the segfaults on one of the servers by adding additional ip addresses. When I got to around 4k ip addresses the recursor simply would not even start, just segafulted right away. I was able to get it to start again, by removing some ip addresses, so i know it has something to do with how many addresses I have bound the server. 
> >>>>>
> >>>>> Any body have an ideas what I can do to correct this problem? I really don't see a reason why the recursor would care how many ip addresses I have on a system. 
> >>>>>
> >>>>> thank you 
> >>>>>
> >>>>>
> >>>>>
> >>>>> _______________________________________________ 
> >>>>> Pdns-users mailing list 
> >>>>> Pdns-users at mailman.powerdns.com 
> >>>>> http://mailman.powerdns.com/mailman/listinfo/pdns-users 
> >>>>>
> >>>> -- 
> >>>>
> >>>> Imre Gergely
> >>>>
> >>>> http://havaz.net
> >>>>
> >>>> gpg --keyserver subkeys.pgp.net --recv-keys 0x34525305
> >>>>
> >> -- 
> >> Imre Gergely
> >> http://havaz.net
> >> gpg --keyserver subkeys.pgp.net --recv-keys 0x34525305
> >>
> >
> >> _______________________________________________
> >> Pdns-users mailing list
> >> Pdns-users at mailman.powerdns.com
> >> http://mailman.powerdns.com/mailman/listinfo/pdns-users
> 
> -- 
> Imre Gergely
> http://havaz.net
> gpg --keyserver subkeys.pgp.net --recv-keys 0x34525305
> 

> GNU gdb (GDB) Red Hat Enterprise Linux (7.2-60.el6_4.1)
> Copyright (C) 2010 Free Software Foundation, Inc.
> License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
> This is free software: you are free to change and redistribute it.
> There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
> and "show warranty" for details.
> This GDB was configured as "i686-redhat-linux-gnu".
> For bug reporting instructions, please see:
> <http://www.gnu.org/software/gdb/bugs/>...
> Reading symbols from /usr/sbin/pdns_recursor...Reading symbols from /usr/lib/debug/usr/sbin/pdns_recursor.debug...done.
> done.
> [?1034h(gdb) run
> Starting program: /usr/sbin/pdns_recursor 
> [Thread debugging using libthread_db enabled]
> 
> Program received signal SIGSEGV, Segmentation fault.
> 0x003b629e in make_request () from /lib/libc.so.6
> Missing separate debuginfos, use: debuginfo-install glibc-2.12-1.132.el6.i686 libgcc-4.4.7-4.el6.i686 libstdc++-4.4.7-4.el6.i686 lua-5.1.4-4.1.el6.i686
> (gdb) backtrace
> #0  0x003b629e in make_request () from /lib/libc.so.6
> #1  0x003b64e9 in __check_pf () from /lib/libc.so.6
> #2  0x003754af in getaddrinfo () from /lib/libc.so.6
> #3  0x0807af9b in makeIPv6sockaddr (addr="2001:503:ba3e::2:30", ret=0xb7ff5d20) at misc.cc:717
> #4  0x080ba3d5 in ComboAddress (rr=...) at iputils.hh:122
> #5  DNSRR2String (rr=...) at recursor_cache.cc:75
> #6  0x080bca00 in MemRecursorCache::replace (this=0x81655c8, now=1400620340, qname="a.root-servers.net.", qt=..., content=std::set with 1 elements = {...}, auth=true) at recursor_cache.cc:249
> #7  0x0805a2b2 in SyncRes::doResolveAt (this=0xb7ff6b60, nameservers=std::set with 13 elements = {...}, auth=".", flawedNSSet=false, qname=".", qtype=..., ret=std::vector of length 0, capacity 0, depth=0, beenthere=std::set with 1 elements = {...}) at syncres.cc:1043
> #8  0x08056472 in SyncRes::doResolve (this=0xb7ff6b60, qname=".", qtype=..., ret=std::vector of length 0, capacity 0, depth=0, beenthere=std::set with 1 elements = {...}) at syncres.cc:440
> #9  0x08062847 in SyncRes::beginResolve (this=0xb7ff6b60, qname=".", qtype=..., qclass=1, ret=std::vector of length 0, capacity 0) at syncres.cc:126
> #10 0x08095cdc in houseKeeping () at pdns_recursor.cc:1179
> #11 0x080ac2be in MTasker<PacketID, std::basic_string<char, std::char_traits<char>, std::allocator<char> > >::threadWrapper (self1=0, self2=135687208, tf=0x8095bc0 <houseKeeping(void*)>, tid=0, val1=0, val2=0) at mtasker.cc:380
> #12 0x002e9b9b in makecontext () from /lib/libc.so.6
> #13 0x00000000 in ?? ()
> (gdb) 
> #0  0x003b629e in make_request () from /lib/libc.so.6
> #1  0x003b64e9 in __check_pf () from /lib/libc.so.6
> #2  0x003754af in getaddrinfo () from /lib/libc.so.6
> #3  0x0807af9b in makeIPv6sockaddr (addr="2001:503:ba3e::2:30", ret=0xb7ff5d20) at misc.cc:717
> #4  0x080ba3d5 in ComboAddress (rr=...) at iputils.hh:122
> #5  DNSRR2String (rr=...) at recursor_cache.cc:75
> #6  0x080bca00 in MemRecursorCache::replace (this=0x81655c8, now=1400620340, qname="a.root-servers.net.", qt=..., content=std::set with 1 elements = {...}, auth=true) at recursor_cache.cc:249
> #7  0x0805a2b2 in SyncRes::doResolveAt (this=0xb7ff6b60, nameservers=std::set with 13 elements = {...}, auth=".", flawedNSSet=false, qname=".", qtype=..., ret=std::vector of length 0, capacity 0, depth=0, beenthere=std::set with 1 elements = {...}) at syncres.cc:1043
> #8  0x08056472 in SyncRes::doResolve (this=0xb7ff6b60, qname=".", qtype=..., ret=std::vector of length 0, capacity 0, depth=0, beenthere=std::set with 1 elements = {...}) at syncres.cc:440
> #9  0x08062847 in SyncRes::beginResolve (this=0xb7ff6b60, qname=".", qtype=..., qclass=1, ret=std::vector of length 0, capacity 0) at syncres.cc:126
> #10 0x08095cdc in houseKeeping () at pdns_recursor.cc:1179
> #11 0x080ac2be in MTasker<PacketID, std::basic_string<char, std::char_traits<char>, std::allocator<char> > >::threadWrapper (self1=0, self2=135687208, tf=0x8095bc0 <houseKeeping(void*)>, tid=0, val1=0, val2=0) at mtasker.cc:380
> #12 0x002e9b9b in makecontext () from /lib/libc.so.6
> #13 0x00000000 in ?? ()
> (gdb) quit
> A debugging session is active.
> 
> 	Inferior 1 [process 5896] will be killed.
> 
> Quit anyway? (y or n) 

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 181 bytes
Desc: Digital signature
URL: <http://mailman.powerdns.com/pipermail/pdns-users/attachments/20140521/b7951935/attachment-0001.sig>


More information about the Pdns-users mailing list