[dnsdist] dnsdist/pdns random servfail responses

cl0sed EE dmt.keler at gmail.com
Tue Mar 6 10:23:25 UTC 2018


 I am in a desperate need of help with dnsdist/pdns/pds recursor combo
debugging.

Versions:
dnsdist 1.2.0
pdns authoritative 0.0.2117gcf04fed (a couple of days old)
pdns recursor 0.0.2000gcf04fed (a couple of days old)
gcc 4.8.5
CentOS 7.4

PostgreSQL backend
No DNSSEC

Logic:
2 DNS servers with dnsdist/pdns/pdns_recursor services
both servers have their own IPv4/IPv6 and loopback IPv4/IPv6 addresses (all
public, no NAT)
loopback addresses are statically routed in the gateway via server adresses
there is no firewall/iptables between the servers, and servers are in a
single broadcast domain, VMware distributed switch
Server hostnames and DNS names are not the same
dnsdist is listening on IPv4/IPv6 loopback addresses
authoritative and recursive backends are listening on server IPv4 addresses
internal clients are sent to the recursor backend
recursor backend has a forward-zones-file defined
other clients are sent to the auth backend

Problem:
While 99% of the queries succeed, some of the queries randomly return
SERVFAIL responses. For example, when checking any DNS name using
cachecheck.opendns.com for the first time, 20-50% of the servers return
SERVFAIL.
For the second time, even refreshing the cache, all the responses are
successful.

What i tried to do:
Ive set up a test domain with MX records and tested mail flow, emails
frequently bounced back because MX record was not found or MX server DNS
name could not be resolved.
I could not catch any of those errors using loglevel 9, log-dns-queries=yes
and log-dns-details=yes
Using tcpdump port 53 i only see something along those lines
10:47:30.717092 IP DNS1_NAME.domain > QUERIED_DNS_NAME.sscan: 8272 ServFail
0/0/0 (49)
I sometimes see timeouts in pdns log
Feb 26 13:04:18 SERVER_HOSTNAME pdns_server[3574]: TCP Connection Thread
died because of network error: Timeout reading data
Feb 26 13:04:18 SERVER_HOSTNAME pdns[3574]: TCP Connection Thread died
because of network error: Timeout reading data

dnsdist:
setLocal('127.0.0.1:53')
addLocal(ROUTED_LOOPBACK_IPV4_ADDRESS)
addLocal(ROUTED_LOOPBACK_IPV6_ADDRESS)
setACL({'0.0.0.0/0', '::/0'})
recursive_ips = newNMG()
recursive_ips:addMask(RECURSIVE_CLIENTS_NETWORK)
newServer({address='DNS1_PUBLIC_IPV4_ADDRESS:5300', pool='auth'})
newServer({address='DNS2_PUBLIC_IPV4_ADDRESS:5300', pool='auth'})
newServer({address='DNS1_PUBLIC_IPV4_ADDRESS:5300', pool='recursor'})
newServer({address='DNS2_PUBLIC_IPV4_ADDRESS:5301', pool='recursor'})
addAction(NetmaskGroupRule(recursive_ips), PoolAction('recursor'))
addAction(AllRule(), PoolAction('auth'))

pdns authoritative:
config-dir=/etc/pdns
daemon=yes
distributor-threads=3
guardian=yes
include-dir=/etc/pdns/pdns.d
launch=
local-address=SERVER_PUBLIC_IPV4_ADDRESS
local-port=5300
module-dir=/usr/lib64/pdns
receiver-threads=2
reuseport=yes
server-id=SERVER_HOSTNAME
slave=no

pdns recursor:
local-address=SERVER_PUBLIC_IPV4_ADDRESS
local-port=5301
allow-from=ALLOWED_NETWORKS
reuseport=yes
pdns-distributes-queries=yes
forward-zones-file=/etc/pdns-recursor/zone-file

forward-zones-file:
DOMAIN=DNS1_PUBLIC_IPV4_ADDRESS:5300, DNS2_PUBLIC_IPV4_ADDRESS:5300
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.powerdns.com/pipermail/dnsdist/attachments/20180306/7b7c60be/attachment.html>


More information about the dnsdist mailing list