[Pdns-users] unusual response times for recursor with A and AAAA lookups

James Lamb jimmystewpot at gmail.com
Wed Aug 30 10:13:01 UTC 2017


Hello,

We have been migrating from a bind based DNS server to pdns-recursor over
the last few months. As we have added additional clients we have noticed
that the latency long tail on pdns is considerably higher (2 seconds+) than
what we saw previously on bind (9.10.3 Ubuntu 16.04).

I have enabled tracing to try and help identify the root cause, however it
has not been immediately clear to me what is actually happening to cause
the delays. In tracing we see the first entries in the log that look like
this.

Aug 30 05:03:14.529593 <hostname> pdns_recursor[5150]: Aug 30 05:03:14 2
[5714976/1] question for '<redacted-hostname>|A' from 10.64.2.20
Aug 30 05:03:14.531452 <hostname> pdns_recursor[5150]: Aug 30 05:03:14 2
[5714977/2] question for '<redacted-hostname>|AAAA' from 10.64.2.20

The A record completes very quickly and correctly. From the trace logs (via
journalctl with --output=short-iso )

Aug 30 05:03:14.542610 <hostname> pdns_recursor[5150]: Aug 30 05:03:14
[5714976] <redacted-hostname>: status=got results, this level of recursion
done

However we do not see anything relating to the AAAA lookup until over 1
second later.

Aug 30 05:03:16.079983 <hostname> pdns_recursor[5150]: Aug 30 05:03:16
[5714977] <redacted-hostname>: Wants DNSSEC processing in query for AAAA
Aug 30 05:03:16.080159 <hostname> pdns_recursor[5150]: Aug 30 05:03:16
[5714977] <redacted-hostname>: Looking for CNAME cache hit of
'<redacted-hostname>|CNAME'
Aug 30 05:03:16.080288 <hostname> pdns_recursor[5150]: Aug 30 05:03:16
[5714977] <redacted-hostname>: No CNAME cache hit of
'<redacted-hostname>|CNAME' found
Aug 30 05:03:16.080439 <hostname> pdns_recursor[5150]: Aug 30 05:03:16
[5714977] <redacted-hostname>: Entire name '<redacted-hostname>' or type
was negatively cached, but entry expired
Aug 30 05:03:16.079983 dns-0.usw2 pdns_recursor[5150]: Aug 30 05:03:16
[5714977] <redacted-hostname>: Wants DNSSEC processing in query for AAAA
Aug 30 05:03:16.080159 dns-0.usw2 pdns_recursor[5150]: Aug 30 05:03:16
[5714977] <redacted-hostname>: Looking for CNAME cache hit of
'<redacted-hostname>|CNAME'
Aug 30 05:03:16.080288 dns-0.usw2 pdns_recursor[5150]: Aug 30 05:03:16
[5714977] <redacted-hostname>: No CNAME cache hit of
'<redacted-hostname>|CNAME' found
Aug 30 05:03:16.080439 dns-0.usw2 pdns_recursor[5150]: Aug 30 05:03:16
[5714977] <redacted-hostname>: Entire name '<redacted-hostname>' or type
was negatively cached, but entry expired
Aug 30 05:03:16.080561 dns-0.usw2 pdns_recursor[5150]: Aug 30 05:03:16
[5714977] <redacted-hostname>: No cache hit for '<redacted-hostname>|AAAA',
trying to find an appropriate NS record
Aug 30 05:03:16.080683 dns-0.usw2 pdns_recursor[5150]: Aug 30 05:03:16
[5714977] <redacted-hostname>: Checking if we have NS in cache for
'<redacted-hostname>'
Aug 30 05:03:16.080814 dns-0.usw2 pdns_recursor[5150]: Aug 30 05:03:16
[5714977] <redacted-hostname>: no valid/useful NS in cache for
'<redacted-hostname>'
Aug 30 05:03:16.080939 dns-0.usw2 pdns_recursor[5150]: Aug 30 05:03:16
[5714977] <redacted-hostname>: Checking if we have NS in cache for '
us-west-2.staging.atl-paas.net'
Aug 30 05:03:16.081062 dns-0.usw2 pdns_recursor[5150]: Aug 30 05:03:16
[5714977] <redacted-hostname>: NS (with ip, or non-glue) in cache for '
us-west-2.staging.atl-paas.net' -> 'ns-1256.awsdns-29.org'
Aug 30 05:03:16.081186 dns-0.usw2 pdns_recursor[5150]: Aug 30 05:03:16
[5714977] <redacted-hostname>: within bailiwick: 0, not in cache / did not
look at cache
Aug 30 05:03:16.081312 dns-0.usw2 pdns_recursor[5150]: Aug 30 05:03:16
[5714977] <redacted-hostname>: NS (with ip, or non-glue) in cache for '
us-west-2.staging.atl-paas.net' -> 'ns-1693.awsdns-19.co.uk'
Aug 30 05:03:16.081434 dns-0.usw2 pdns_recursor[5150]: Aug 30 05:03:16
[5714977] <redacted-hostname>: within bailiwick: 0, not in cache / did not
look at cache
Aug 30 05:03:16.081557 dns-0.usw2 pdns_recursor[5150]: Aug 30 05:03:16
[5714977] <redacted-hostname>: NS (with ip, or non-glue) in cache for '
us-west-2.staging.atl-paas.net' -> 'ns-29.awsdns-03.com'
Aug 30 05:03:16.081683 dns-0.usw2 pdns_recursor[5150]: Aug 30 05:03:16
[5714977] <redacted-hostname>: within bailiwick: 0, not in cache / did not
look at cache
Aug 30 05:03:16.081804 dns-0.usw2 pdns_recursor[5150]: Aug 30 05:03:16
[5714977] <redacted-hostname>: NS (with ip, or non-glue) in cache for '
us-west-2.staging.atl-paas.net' -> 'ns-995.awsdns-60.net'
Aug 30 05:03:16.081947 dns-0.usw2 pdns_recursor[5150]: Aug 30 05:03:16
[5714977] <redacted-hostname>: within bailiwick: 0, not in cache / did not
look at cache
Aug 30 05:03:16.082068 dns-0.usw2 pdns_recursor[5150]: Aug 30 05:03:16
[5714977] <redacted-hostname>: We have NS in cache for '
us-west-2.staging.atl-paas.net' (flawedNSSet=0)
Aug 30 05:03:16.082193 dns-0.usw2 pdns_recursor[5150]: Aug 30 05:03:16
[5714977] <redacted-hostname>: Cache consultations done, have 4 NS to
contact
Aug 30 05:03:16.082338 dns-0.usw2 pdns_recursor[5150]: Aug 30 05:03:16
[5714977] <redacted-hostname>.: Nameservers: ns-995.awsdns-60.net.(8.34ms),
ns-1256.awsdns-29.org.(21.32ms), ns-29.awsdns-03.com.(23.94ms),
Aug 30 05:03:16.082459 dns-0.usw2 pdns_recursor[5150]: Aug 30 05:03:16
[5714977] <redacted-hostname>.:
ns-1693.awsdns-19.co.uk.(29.00ms)
Aug 30 05:03:16.082581 dns-0.usw2 pdns_recursor[5150]: Aug 30 05:03:16
[5714977] <redacted-hostname>: Trying to resolve NS 'ns-995.awsdns-60.net'
(1/4)
Aug 30 05:03:16.082708 dns-0.usw2 pdns_recursor[5150]: Aug 30 05:03:16
[5714977]    ns-995.awsdns-60.net: Wants DNSSEC processing in query for A
Aug 30 05:03:16.082829 dns-0.usw2 pdns_recursor[5150]: Aug 30 05:03:16
[5714977]    ns-995.awsdns-60.net: Looking for CNAME cache hit of '
ns-995.awsdns-60.net|CNAME'
Aug 30 05:03:16.082955 dns-0.usw2 pdns_recursor[5150]: Aug 30 05:03:16
[5714977]    ns-995.awsdns-60.net: No CNAME cache hit of '
ns-995.awsdns-60.net|CNAME' found
Aug 30 05:03:16.083074 dns-0.usw2 pdns_recursor[5150]: Aug 30 05:03:16
[5714977]    ns-995.awsdns-60.net: Found cache hit for A:
205.251.195.227[ttl=170146]
Aug 30 05:03:16.083194 dns-0.usw2 pdns_recursor[5150]: Aug 30 05:03:16
[5714977] <redacted-hostname>: Resolved 'us-west-2.staging.atl-paas.net' NS
ns-995.awsdns-60.net to: 205.251.195.227
Aug 30 05:03:16.083314 dns-0.usw2 pdns_recursor[5150]: Aug 30 05:03:16
[5714977] <redacted-hostname>: Trying IP 205.251.195.227:53, asking
'<redacted-hostname>|AAAA'
Aug 30 05:03:16.083434 dns-0.usw2 pdns_recursor[5150]: Aug 30 05:03:16
[5714977] <redacted-hostname>: timeout resolving after 1521.4msec
Aug 30 05:03:16.083554 dns-0.usw2 pdns_recursor[5150]: Aug 30 05:03:16
[5714977] <redacted-hostname>: Trying to resolve NS 'ns-1256.awsdns-29.org'
(2/4)
Aug 30 05:03:16.083677 dns-0.usw2 pdns_recursor[5150]: Aug 30 05:03:16
[5714977]    ns-1256.awsdns-29.org: Wants DNSSEC processing in query for A
Aug 30 05:03:16.083821 dns-0.usw2 pdns_recursor[5150]: Aug 30 05:03:16
[5714977]    ns-1256.awsdns-29.org: Looking for CNAME cache hit of '
ns-1256.awsdns-29.org|CNAME'
Aug 30 05:03:16.083940 dns-0.usw2 pdns_recursor[5150]: Aug 30 05:03:16
[5714977]    ns-1256.awsdns-29.org: No CNAME cache hit of '
ns-1256.awsdns-29.org|CNAME' found
Aug 30 05:03:16.084058 dns-0.usw2 pdns_recursor[5150]: Aug 30 05:03:16
[5714977]    ns-1256.awsdns-29.org: Found cache hit for A:
205.251.196.232[ttl=170157]
Aug 30 05:03:16.084179 dns-0.usw2 pdns_recursor[5150]: Aug 30 05:03:16
[5714977] <redacted-hostname>: Resolved 'us-west-2.staging.atl-paas.net' NS
ns-1256.awsdns-29.org to: 205.251.196.232
Aug 30 05:03:16.084298 dns-0.usw2 pdns_recursor[5150]: Aug 30 05:03:16
[5714977] <redacted-hostname>: Trying IP 205.251.196.232:53, asking
'<redacted-hostname>|AAAA'
Aug 30 05:03:16.084417 dns-0.usw2 pdns_recursor[5150]: Aug 30 05:03:16
[5714977] <redacted-hostname>: Got 2 answers from ns-1256.awsdns-29.org
(205.251.196.232), rcode=0 (No Error), aa=1, in 26ms
Aug 30 05:03:16.084548 dns-0.usw2 pdns_recursor[5150]: Aug 30 05:03:16
[5714977] <redacted-hostname>: accept answer 'us-west-2.staging.atl-paas.net
|SOA|ns-995.awsdns-60.net. awsdns-hostmaster.amazon.com. 1 7200 900 1209600
60' from 'us-west-2.staging.atl-paas.net' nameservers? 2 YES!
Aug 30 05:03:16.084669 dns-0.usw2 pdns_recursor[5150]: Aug 30 05:03:16
[5714977] <redacted-hostname>: OPT answer '.' from '
us-west-2.staging.atl-paas.net' nameservers
Aug 30 05:03:16.084789 dns-0.usw2 pdns_recursor[5150]: Aug 30 05:03:16
[5714977] <redacted-hostname>: determining status after receiving this
packet
Aug 30 05:03:16.084908 dns-0.usw2 pdns_recursor[5150]: Aug 30 05:03:16
[5714977] <redacted-hostname>: got negative caching indication for
'<redacted-hostname>|AAAA'
Aug 30 05:03:16.085027 dns-0.usw2 pdns_recursor[5150]: Aug 30 05:03:16
[5714977] <redacted-hostname>: status=noerror, other types may exist, but
we are done (have negative SOA) (have aa bit)

As far as we can tell the configuration is simple

config-dir=/etc/powerdns/
allow-from-file=/etc/powerdns/permit-queries-from.acl
forward-zones-file=/etc/powerdns/forwarders.conf

daemon=yes
setgid=pdns
setuid=pdns
socket-dir=/var/run

etc-hosts-file=/etc/hosts

local-address=0.0.0.0
query-local-address=0.0.0.0
local-port=53

quiet=yes
loglevel=4
log-common-errors=yes

max-cache-entries=500000
latency-statistic-size=30000

max-cache-ttl=172801
max-tcp-clients=128
network-timeout=1500

server-down-max-fails=32
server-down-throttle-time=120

serve-rfc1918=no
dont-query=


dnssec=process-no-validate

max-recursion-depth=40
max-qperq=50
client-tcp-timeout=5

reuseport=yes
threads=4
max-mthreads=2048
pdns-distributes-queries=true

We have seen this behavior on both 3.7.4 and 4.0.6. We are using the
official packages from the repo listed here: https://repo.powerdns.com/ we
run pdns-recursor across 7 regions and all exhibit the same behavior
whereas previously these same queries via bind did not show the same delays.

pdns_recursor --version
Aug 30 10:09:04 PowerDNS Recursor 4.0.6 (C) 2001-2016 PowerDNS.COM BV
Aug 30 10:09:04 Using 64-bits mode. Built using gcc 5.4.0 20160609 on Jul
4 2017 15:43:52 by root at 5ee67e1ed1a4.
Aug 30 10:09:04 PowerDNS comes with ABSOLUTELY NO WARRANTY. This is free
software, and you are welcome to redistribute it according to the terms of
the GPL version 2.
Aug 30 10:09:04 Features: openssl lua
Aug 30 10:09:04 Configured with: " '--host=' '--build=' '--prefix=/usr'
'--sysconfdir=/etc/powerdns' '--mandir=${prefix}/share/man'
'--infodir=${prefix}/share/info' '--libdir=${prefix}/lib/'
'--libexecdir=${prefix}/lib' '--with-lua' '--with-protobuf=yes'
'--enable-systemd' '--with-systemd=/lib/systemd/system' 'build_alias='
'host_alias=' 'CFLAGS=-g -O2 -fPIE -fstack-protector-strong -Wformat
-Werror=format-security' 'LDFLAGS=-Wl,-Bsymbolic-functions -fPIE -pie
-Wl,-z,relro -Wl,-z,now' 'CPPFLAGS=-Wdate-time -D_FORTIFY_SOURCE=2'
'CXXFLAGS=-g -O2 -fPIE -fstack-protector-strong -Wformat
-Werror=format-security'"

Any assistance or advice on further diagnosing this issue would be greatly
appreciated.

Regards,

James
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.powerdns.com/pipermail/pdns-users/attachments/20170830/c8f263a4/attachment.html>


More information about the Pdns-users mailing list