[Pdns-users] pdns stops responding to requests
Brandon Checketts
brandonc at webpipe.net
Tue Dec 19 04:03:34 UTC 2006
Here are all of the details I can come up with:
1- tcpdump clearly shows that the server is getting DNS requests
2- Output of "pdns dump"
[root at powerdns2 /]# /etc/init.d/pdns dump
corrupt-packets=426,deferred-cache-inserts=44,deferred-cache-lookup=100,latency=32709,
packetcache-hit=17770,packetcache-miss=20433,packetcache-size=5658,qsize-q=37,
query-cache-hit=89780,query-cache-miss=83406,recursing-answers=0,recursing-questions=0,
servfail-packets=0,tcp-answers=2,tcp-queries=2,timedout-packets=0,udp-answers=37835,
udp-queries=38336,udp4-answers=37835,udp4-queries=38282,udp6-answers=0,udp6-queries=0,
When it was locked up, I ran the command several minutes later and the numbers
were exactly the same
3- Confirmed that the built-in PowerDNS web server was up and accessible.
Interesting numbers from that page:
Uptime: 1.85 hours Queries/second, 1, 5, 10 minute averages: 2.61e-14, 0.0111,
0.304. Max queries/second: 11.4
Cache hitrate, 1, 5, 10 minute averages: 49%, 48%, 48%
Backend query cache hitrate, 1, 5, 10 minute averages: 48%, 50%, 51%
Backend query load, 1, 5, 10 minute averages: 4.18e-14, 0.0216, 0.614. Max
queries/second: 23.8
Total queries: 38336. Question/answer latency: 32.7ms
corrupt-packets 426 Number of corrupt packets received
deferred-cache-inserts 44 Amount of cache inserts that were deferred
because of maintenance
deferred-cache-lookup 100 Amount of cache lookups that were deferred
because of maintenance
latency 32709 Average number of microseconds needed to answer
a question
packetcache-hit 17770
packetcache-miss 20433
packetcache-size 5658
qsize-q 37 Number of questions waiting for database attention
query-cache-hit 89780 Number of hits on the query cache
query-cache-miss 83406 Number of misses on the query cache
recursing-answers 0 Number of recursive answers sent out
recursing-questions 0 Number of questions sent to recursor
servfail-packets 0 Number of times a server-failed packet was sent out
tcp-answers 2 Number of answers sent out over TCP
tcp-queries 2 Number of TCP queries received
timedout-packets 0 Number of packets which weren't answered within
timeout set
udp-answers 37835 Number of answers sent out over UDP
udp-queries 38336 Number of UDP queries received
udp4-answers 37835 Number of IPv4 answers sent out over UDP
udp4-queries 38282 Number of IPv4UDP queries received
udp6-answers 0 Number of IPv6 answers sent out over UDP
udp6-queries 0 Number of IPv6 UDP queries received
4- A couple of show commands:
[root at powerdns2 /]# /etc/init.d/pdns show qsize-q
qsize-q=37
[root at powerdns2 /]# /etc/init.d/pdns show packetcache-size
packetcache-size=5658
5- While it was locked up, I did an strace on each of the pdns processes. The
3 processes I believe were the MySQL Backends just said "rt_sigsuspend([]". The
process that I saw the actual queries coming in/going out of also just said
"rt_sigsuspend([]"
One interesting thing from the straces was this:
this process:
pdns 821 817 0 18:40 ? 00:00:00 /usr/sbin/pdns_server-instance
--daemon --guardian=yes
is doing this each second:
recvfrom(13, 0xbebff2bc, 1500, 0, 0xbebff8f8, 0xbebff91c) = -1 EAGAIN (Resource
temporarily unavailable)
time(NULL) = 1166498719
time(NULL) = 1166498719
rt_sigprocmask(SIG_BLOCK, [CHLD], [RTMIN], 8) = 0
rt_sigaction(SIGCHLD, NULL, {SIG_DFL}, 8) = 0
rt_sigprocmask(SIG_SETMASK, [RTMIN], NULL, 8) = 0
The "Resource temporarily unavailable" may seem to point to something, but it
appears there during normal operation as well.
It may be worth noting, that if I leave an strace running on the process that is
sending/receiving requests, it dies occasionally with this message:
gettimeofday(upeek: ptrace(PTRACE_PEEKUSER,12878,44,0): No such process
Process 12878 detached
I should also point out that this powerdns server is actually a virtual server
running on a Linux VServer with kernel 2.6.18.3-vs2.1.1.2. The other powerdns
server that we are running and that is working fine is also running on a Linux
VServer with kernel 2.6.18.2-vs2.1.1
Any ideas about why this is dying would be appreciated. This server is an
authoritative name for about a thousand domains, so I obviously would like to
get this resolved instead of having to restart the process when it dies (which
happens 5-8 times a day).
Thanks,
Brandon Checketts
Webpipe.net System Administrator
Brandon Checketts wrote:
> Augie,
>
> Thank you for your response. I'm running version 2.9.20 that was installed from
> the pdns-static-2.9.20-1 RPM. UDP is definitely failing, not sure about TCP as
> we get very few TCP Requests.
>
> I've just installed strace, and can run it. Each pdns_server-instance processes
> display different things, but I'm not a programmer, so I don't understand much
> of the output. I can pick out the three processes that are doing the MySQL
> queries, and the one that looks like its receiving/sending the actual DNS
> traffic, so I will do an strace on it the next time that it fails to see if it
> identifies anything. I will also try the pdns "dump" and see if that reveals
> anything.
>
> I looked at a tcpdump that I had running at the time of a failure and didn't see
> anything unusual, just DNS requests, MySQL lookups/replys and DNS answers.
> Then, when it fails, only incoming DNS requests with no other traffic until it
> is restarted.
>
> Thanks,
> Brandon Checketts
> Webpipe.net System Administrator
>
>
>
> Augie Schwer wrote:
>> On 12/17/06, Brandon Checketts <brandonc at webpipe.net> wrote:
>>> I've just replaced two BIND servers with PowerDNS Servers. They are
>>> configured
>>> to use the gmysql backend, and MySQL is performing the replication
>>> between them.
>>> Everything seems to be working fine, but on the "slave" server, pdns
>>> seems to
>>> just quit responding to DNS requests sometimes. The web server
>>> portion of it
>>> continues to respond, but DNS queries just don't get answered. Also,
>>> when it
>>> locks up, a netstat -l -n shows this (notice the Recv-Q for udp on
>>> port 53):
>> What version are you running? If there is nothing in the logs, then
>> you can try straceing one of the threads to see if that reveals any
>> more info. Is it both UDP and TCP that stop responding? Did you try
>> tcpdumping on the interface to make sure you are receiving the
>> requests. You can also look at the stats:
>>
>> /etc/init.d/pdns dump
>>
>> The latency and q-size stats may be revealing.
>>
>>
> _______________________________________________
> Pdns-users mailing list
> Pdns-users at mailman.powerdns.com
> http://mailman.powerdns.com/mailman/listinfo/pdns-users
>
>
More information about the Pdns-users
mailing list