<div dir="ltr"><div>Following on this issue, the only other thing that i could check in the logs are lots of entries like this:<br><br>messages-20160228:Feb 24 21:01:51 dns1 pdns[1587]: Respawning<br>messages-20160228:Feb 24 21:01:54 dns1 pdns[13845]: 5017 questions waiting for database attention. Limit is 5000, respawning<br>messages-20160228:Feb 24 21:01:54 dns1 pdns[1587]: Respawning<br>messages-20160228:Feb 24 21:01:57 dns1 pdns[13926]: 5018 questions waiting for database attention. Limit is 5000, respawning<br>messages-20160228:Feb 24 21:01:57 dns1 pdns[1587]: Respawning<br>messages-20160228:Feb 24 21:02:00 dns1 pdns[14029]: 5010 questions waiting for database attention. Limit is 5000, respawning<br>messages-20160228:Feb 24 21:02:00 dns1 pdns[1587]: Respawning<br>messages-20160228:Feb 25 21:05:25 dns1 pdns[5498]: Respawning<br>messages-20160228:Feb 25 21:05:27 dns1 pdns[5498]: Respawning<br>messages-20160228:Feb 25 21:05:29 dns1 pdns[5498]: Respawning<br>messages-20160228:Feb 25 21:05:31 dns1 pdns[5498]: Respawning<br>messages-20160228:Feb 25 21:05:33 dns1 pdns[5498]: Respawning<br>messages-20160228:Feb 25 21:05:35 dns1 pdns[5498]: Respawning<br>messages-20160228:Feb 25 21:05:37 dns1 pdns[5498]: Respawning<br>messages-20160228:Feb 25 21:05:39 dns1 pdns[5498]: Respawning<br><br></div>Do you think is it something related to mysql backend?<br></div><div class="gmail_extra"><br><div class="gmail_quote">On Tue, Feb 23, 2016 at 11:38 AM, Miguel Miranda <span dir="ltr"><<a href="mailto:miguel.mirandag@gmail.com" target="_blank">miguel.mirandag@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">Hello to all, I kindly ask for your advice regarding an issue i am having with a pair of pdns server/resolvers. Both are running Centos 6.4 and pdns 3.3.3. and recursor 3.7 in native mode with mysql replication.<br>The issue is that suddently we receive a lot of customer complaints about slow dns responses, and indeed, in our tests we got 3 - 4 seconds delay in dns queries or not response at all. We run both servers and recursor in the same machine with recursor listening on localhost, i know this is not the best practice but i cant change that topology right now. Our network guys are telling me that overall traffic increases about 10 mb in each servers. We got the same delay in both servers.<br>Looking at the webserver page in each server qps increase from 4,000 to 22,000 and the qsize-q increases from 0 to 2,500 or so value. Looking at the logs there are lots of entries like this:<span lang="en"><span><br><br>Feb 21 18:35:54 dns1 pdns[27173]: Recursive query for remote <a href="http://190.150.38.225:63567" target="_blank">190.150.38.225:63567</a> with internal id 1433 was not answered by backend within timeout, reusing id<br>Feb 21 18:35:54 dns1 pdns[27173]: Recursive query for remote <a href="http://190.150.218.7:33221" target="_blank">190.150.218.7:33221</a> with internal id 1435 was not answered by backend within timeout, reusing id<br>Feb 21 18:35:54 dns1 rsyslogd-2177: imuxsock begins to drop messages from pid 27173 due to rate-limiting<br>Feb 21 18:35:56 dns1 pdns_recursor-balancer1[24608]: Sending SERVFAIL to 127.0.0.1 during resolve of '<a href="http://fbapi.sd.duapps.com" target="_blank">fbapi.sd.duapps.com</a>.' because: Too much time waiting for fbapi.dxsvr.com.|A, timeouts: 5, throttles: 0, queries: 7, 7914msec<br></span></span><div><div><span lang="en"><span>Feb 21 18:35:47 dns1 pdns[27173]: Received a malformed qdomain from 186.32.122.240, 's1!0#037#006#003U#004': sending servfail<br>Feb 21 18:35:47 dns1 pdns[27173]: Received a malformed qdomain from 190.99.48.212, 'b._dns-sd._udp.xSò#001h¹ó#001': sending servfail<br>Feb 21 18:35:47 dns1 pdns[27173]: Received a malformed qdomain from 190.99.48.212, 'dr._dns-sd._udp.xSò#001h¹ó#001': sending servfail<br>Feb 21 18:35:47 dns1 pdns[27173]: Received a malformed qdomain from 190.99.48.212, 'r._dns-sd._udp.xSò#001h¹ó#001': sending servfail<br>Feb 21 18:35:47 dns1 pdns[27173]: Received a malformed qdomain from 190.99.48.212, 'db._dns-sd._udp.xSò#001h¹ó#001': sending servfail<br>Feb 21 18:35:47 dns1 pdns[27173]: Received a malformed qdomain from 190.99.48.212, 'lb._dns-sd._udp.xSò#001h¹ó#001': sending servfail<br>Feb 21 18:35:47 dns1 pdns_recursor-balancer2[24616]: Timeout from remote TCP client 127.0.0.1<br><br></span></span></div><div><span lang="en"><span>the delay happens only when using the external ip, if i try to resolve a host using pdns recursor directly running on localhost, there is no delay.<br>No other signs of what could be causing the high qsize-q values.<br>The only way to resolve the slow response is to restart pdns service, pdns-recursor is not restarted so i think the problem is with pdns when it tries to forward recursive queries to dnsdist.<br>Im lost at what to check to track the problem cause.<br>regards.<br></span></span></div></div></div>
</blockquote></div><br></div>