<div dir="ltr">

<div dir="ltr" style="color:rgb(34,34,34);font-family:arial,sans-serif;font-size:12.8px;font-style:normal;font-variant-ligatures:normal;font-variant-caps:normal;font-weight:400;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial"><div>I am in a desperate need of help with <span class="gmail-il">dnsdist</span>/pdns/pds recursor combo debugging.</div><div><br></div><div><div>Versions:</div><div><span class="gmail-il">dnsdist</span><span> </span>1.2.0</div><div>pdns authoritative 0.0.2117gcf04fed (a couple of days old)</div><div>pdns recursor 0.0.2000gcf04fed (a couple of days old)</div><div>gcc 4.8.5</div><div>CentOS 7.4</div><div><br></div><div>PostgreSQL backend</div><div>No DNSSEC</div><div><br></div><div>Logic:</div><div>2 DNS servers with<span> </span><span class="gmail-il">dnsdist</span>/pdns/pdns_recursor services</div><div>both servers have their own IPv4/IPv6 and loopback IPv4/IPv6 addresses (all public, no NAT)</div><div>loopback addresses are statically routed in the gateway via server adresses</div><div>there is no firewall/iptables between the servers, and servers are in a single broadcast domain, VMware distributed switch</div><div>Server hostnames and DNS names are not the same</div><div><span class="gmail-il">dnsdist</span><span> </span>is listening on IPv4/IPv6 loopback addresses</div><div>authoritative and recursive backends are listening on server IPv4 addresses</div><div>internal clients are sent to the recursor backend</div><div>recursor backend has a forward-zones-file defined</div><div>other clients are sent to the auth backend</div><div><br></div><div>Problem:</div><div>While 99% of the queries succeed, some of the queries randomly return SERVFAIL responses. For example, when checking any DNS name using<span> </span><a href="http://cachecheck.opendns.com/" target="_blank" style="color:rgb(17,85,204)">cachecheck.opendns.com</a><span> </span>for the first time, 20-50% of the servers return SERVFAIL.</div><div>For the second time, even refreshing the cache, all the responses are successful. </div><div><br></div><div>What i tried to do:</div><div>Ive set up a test domain with MX records and tested mail flow, emails frequently bounced back because MX record was not found or MX server DNS name could not be resolved.</div><div>I could not catch any of those errors using loglevel 9, log-dns-queries=yes and log-dns-details=yes</div><div>Using tcpdump port 53 i only see something along those lines </div><div><span style="white-space:pre-wrap">  </span>10:47:30.717092 IP DNS1_NAME.domain > QUERIED_DNS_NAME.sscan: 8272 ServFail 0/0/0 (49)</div><div>I sometimes see timeouts in pdns log</div><div><span style="white-space:pre-wrap">     </span>Feb 26 13:04:18 SERVER_HOSTNAME pdns_server[3574]: TCP Connection Thread died because of network error: Timeout reading data</div><div><span style="white-space:pre-wrap">     </span>Feb 26 13:04:18 SERVER_HOSTNAME pdns[3574]: TCP Connection Thread died because of network error: Timeout reading data</div><div><br></div><div><span class="gmail-il">dnsdist</span>:</div><div>setLocal('<a href="http://127.0.0.1:53/" target="_blank" style="color:rgb(17,85,204)">127.0.0.1:53</a>')</div><div>addLocal(ROUTED_LOOPBACK_IPV4_<wbr>ADDRESS)</div><div>addLocal(ROUTED_LOOPBACK_IPV6_<wbr>ADDRESS)</div><div>setACL({'<a href="http://0.0.0.0/0" target="_blank" style="color:rgb(17,85,204)">0.0.0.0/0</a>', '::/0'})</div><div>recursive_ips = newNMG()</div><div>recursive_ips:addMask(<wbr>RECURSIVE_CLIENTS_NETWORK)</div><div>newServer({address='DNS1_<wbr>PUBLIC_IPV4_ADDRESS:5300', pool='auth'})</div><div>newServer({address='DNS2_<wbr>PUBLIC_IPV4_ADDRESS:5300', pool='auth'})</div><div>newServer({address='DNS1_<wbr>PUBLIC_IPV4_ADDRESS:5300', pool='recursor'})</div><div>newServer({address='DNS2_<wbr>PUBLIC_IPV4_ADDRESS:5301', pool='recursor'})</div><div>addAction(NetmaskGroupRule(<wbr>recursive_ips), PoolAction('recursor'))</div><div>addAction(AllRule(), PoolAction('auth'))</div><div><br></div><div>pdns authoritative:</div><div>config-dir=/etc/pdns</div><div>daemon=yes</div><div>distributor-threads=3</div><div>guardian=yes</div><div>include-dir=/etc/pdns/pdns.d</div><div>launch=</div><div>local-address=SERVER_PUBLIC_<wbr>IPV4_ADDRESS</div><div>local-port=5300</div><div>module-dir=/usr/lib64/pdns</div><div>receiver-threads=2</div><div>reuseport=yes</div><div>server-id=SERVER_HOSTNAME</div><div>slave=no</div><div><br></div><div>pdns recursor:</div><div>local-address=SERVER_PUBLIC_<wbr>IPV4_ADDRESS</div><div>local-port=5301</div><div>allow-from=ALLOWED_NETWORKS</div><div>reuseport=yes</div><div>pdns-distributes-queries=yes</div><div>forward-zones-file=/etc/pdns-<wbr>recursor/zone-file</div><div><br></div><div>forward-zones-file:</div><div>DOMAIN=DNS1_PUBLIC_IPV4_<wbr>ADDRESS:5300, DNS2_PUBLIC_IPV4_ADDRESS:5300</div><div class="gmail-yj6qo"></div><div class="gmail-adL"><br></div></div></div><br class="gmail-Apple-interchange-newline">

<br></div>