[Pdns-users] crashes in bind backend on rediscover

Richard Poole richard.poole at heartinternet.co.uk
Wed Aug 11 13:24:04 UTC 2010


We're seeing crashes in powerdns 2.9.22 when calling "pdns_control
rediscover". We have a cron job that does this, currently twice an hour,
and on average about once a day it results in a crash, looking like this
in /var/log/messages:

Aug 11 12:10:42 ns1 pdns[1980]: Got a signal 6, attempting to print trace: 
Aug 11 12:10:42 ns1 pdns[1980]: /usr/sbin//pdns_server-instance [0x80cb5e4]
Aug 11 12:10:42 ns1 pdns[1980]: [0x110420]
Aug 11 12:10:42 ns1 pdns[1980]: [0x110410]
Aug 11 12:10:42 ns1 pdns[1980]: /lib/libc.so.6(gsignal+0x50) [0x179df0]
Aug 11 12:10:42 ns1 pdns[1980]: /lib/libc.so.6(abort+0x101) [0x17b701]
Aug 11 12:10:42 ns1 pdns[1980]: /lib/libc.so.6(__assert_fail+0xfb) [0x17326b]
Aug 11 12:10:42 ns1 pdns[1980]: /usr/sbin//pdns_server-instance(_ZN12Bind2Backend6insertEN5boost10shared_ptrINS_5StateEEEiRKSsRK5QTypeS5_ii+0x847) [0x81151d7]
Aug 11 12:10:42 ns1 pdns[1980]: /usr/sbin//pdns_server-instance(_ZN12Bind2Backend10loadConfigEPSs+0x8c6) [0x8119e06]
Aug 11 12:10:42 ns1 pdns[1980]: /usr/sbin//pdns_server-instance(_ZN12UeberBackend10rediscoverEPSs+0x38) [0x80d77f8]
Aug 11 12:10:42 ns1 pdns[1980]: /usr/sbin//pdns_server-instance(_Z19DLRediscoverHandlerRKSt6vectorISsSaISsEEi+0xcd) [0x80e31bd]
Aug 11 12:10:42 ns1 pdns[1980]: /usr/sbin//pdns_server-instance(_ZN11DynListener11theListenerEv+0x5c0) [0x80dded0]
Aug 11 12:10:42 ns1 pdns[1980]: /usr/sbin//pdns_server-instance(_ZN11DynListener17theListenerHelperEPv+0x11) [0x80decb1]
Aug 11 12:10:42 ns1 pdns[1980]: /lib/libpthread.so.0 [0x7a373b]
Aug 11 12:10:42 ns1 pdns[1980]: /lib/libc.so.6(clone+0x5e) [0x222cfe]

We have about 640000 domains in total, with typically up to about 50
new ones each time the cron job runs. They are all slave zones (from a
non-public master). We're typically getting about 700 queries per second
at peak times. The crashes are sometimes at busy times, sometimes not,
with no apparent correlation to anything else that I know of (although of
course the sample size is not huge). We've compiled with --with-modules=""
because we don't run any backends other than bind; the box is stock Red
Hat Enterprise 5.3 with boost 1.33.1, as shipped by Red Hat.

These crashes have been seen on two different boxes with the same setup,
so I don't think it can be a hardware fault; we first saw them running
the pdns-static rpm as downloaded from powerdns.com and there's been no
change now we're running our own build (except the stack trace is more
informative because it now has symbols in it).

Does anyone have any suggestions? What should I do next to diagnose the
problem? Is this something anyone else has seen? We are getting it on
both of our publicly visible nameservers so we're having customer-visible
problems an average of twice a day with a non-negligible chance of losing
both nameservers simultaneously and my boss is going to tell me to go
back to running bind sooner or later. :(

Thanks,

Richard

-- 
Richard Poole
System Administrator
Heart Internet Ltd
richard.poole at heartinternet.co.uk
http://www.heartinternet.co.uk/
Tel: 0845 644 7750
Fax: 0845 644 7740

******************************************************************
This email and any files transmitted with it are confidential and
intended solely for the use of the individual or entity to whom
they are addressed. If you are not the intended recipient you are
not authorised to and must not disclose, copy, distribute, or
retain this message or any part of it.

Heart Internet Ltd accepts no responsibility for information,
errors or omissions in this email.
******************************************************************



More information about the Pdns-users mailing list