[Pdns-users] CPU consumption of pdns_recursor
Otto Moerbeek
otto at drijf.net
Tue Apr 6 06:35:59 UTC 2021
On Mon, Apr 05, 2021 at 05:30:11PM +0000, Nejedlo, Mark via Pdns-users wrote:
Some thoughts:
2 distributior thread feels a bit overkill, 1 distributor thread
should be able to feed 8 workers. Did you do measurements to come to
this value?
Your maintenance function looks like it could run for a while if
timeouts happen. I don't know how bad that would be from the top of
my head, but it could have an impact.
Your packet cache hit ratio is extremenly low. Are you running dnsdist
with a packetcache in front of this recursor? Or can you explain it
from the type of queries you are getting?
no-packet-error is very high. That could have something to do the
first point, but I'm not certain about that; maybe somebody else on
this list can shed a light on what could cause that.
-Otto
If you are setting up a lab env, I certainly would try to use 4.4 or
4.5 (which is in beta) they both contain performance improvememnts.
> On Monday, April 5, 2021 10:05 AM, Otto Moerbeek Wrote:
> > On Mon, Apr 05, 2021 at 02:40:17PM +0000, Nejedlo, Mark via Pdns-users
> > wrote:
> >
> > > We recently replaced some Bind servers with PowerDNS recursor, and
> > > were rather surprised to see CPU usage essentially double for the same
> > > workload. My expectation was that the load would be more or less
> > > equivalent between the two resolvers. It looks like the load is
> > > centered in the actual pdns_recursor worker threads, not the distributor
> > > threads or dnsdist. Is it expected that CPU usage would be so much
> > > higher under PowerDNS? Is there a debug parameter that can be set to
> > > see where PowerDNS is spending its time?
> > >
> > > I've already gone through the performance guide and don't see any
> > > additional tuning parameters that might help. We're using LUA, but 1)
> > > it's pretty lightweight (a couple hash lookups and modify dq object),
> > > and 2) when I disabled the LUA scripting for testing, it didn't make a
> > > noticeable difference in the load.
> >
> > Impossible to say anything without any info about hardware, os, config
> > and metrics.
>
> Hardware is a VMware guest on ESXi, 12 cores, 4GB RAM (no apparent RAM pressure). OS is Red Hat Enterprise 7. Running recursor 4.3.5, via the official PowerDNS centos 7 RPM.
>
> Config Files
> ------------
>
> /etc/pdns-recursor/recursor.conf:
> --------------------
> setgid=pdns-recursor
> setuid=pdns-recursor
> version-string=anonymous
> threads=8
> pdns-distributes-queries=yes
> distributor-threads=2
> distribution-load-factor=1.25
> query-local-address=64.50.242.202
> query-local-address6=2600:3404:200:1:250:56ff:feae:df16
>
> use-incoming-edns-subnet=true
> xpf-rr-code=65280
> xpf-allow-from=127.0.0.1, ::1
> edns-subnet-whitelist=tds.net,tdstelecom.com
> local-port=5353
> local-address=127.0.0.1,::1
> forward-zones-file=/etc/pdns-recursor/forward.zones
> lua-dns-script=/etc/pdns-recursor/recursor-script.lua
> --------------------
>
>
> /etc/pdns-recursor/recursor-script.lua
> --------------------
> -- load business opt out list once on lua startup/reload
> nxd_business_opt_out = newNMG()
> nxd_boo_f = assert(io.open('/etc/pdns-recursor/NXD_BusinessOptOut.txt', 'r'))
> for line in nxd_boo_f:lines() do
> if not string.match(line, '[G-Zg-z<>#]') then -- throw away non-data lines associated with answerx formatting
> nxd_business_opt_out:addMask(line)
> end
> end
>
> nxdomainhostskip = {'wpad', 'mail', 'ntp', 'voip', 'ftp', 'irc', 'finger'}
> nxdomainhostmatch = {'web', 'my', 'ww', 'home', 'http'}
>
> nxd_data = {}
>
> -- counters for redirects and opt-out hits
> nxd_optoutcount = getMetric("nxdomain-opt-out")
> nxd_redirectcount = getMetric("nxdomain-redirect")
>
> -- maintenance processes, including:
> -- - refreshing the NXDomain hijacking record
> function maintenance()
> local socket = require("socket")
> local udp = assert(socket.udp6())
> local resp
> local nxd_tmp = {}
>
> udp:settimeout(1)
> assert(udp:setsockname("*",0))
> assert(udp:setpeername("::1",5354))
>
> for i = 0, 2, 1 do
> assert(udp:send("NXD ANY"))
> resp = udp:receive()
> if resp then
> break
> end
> end
>
> if resp == nil then
> pdnslog("NXdomain refresh timeout")
> else
> local splitdata = {}
> for splits in string.gmatch(resp, "%S+") do
> table.insert(splitdata, splits)
> end
> if splitdata[1] == 'NXD' then
> local rrtype = ''
> local ttl = ''
> local ip = ''
> for i = 2, #splitdata do
> if rrtype == '' then
> if splitdata[i] == 'A' or splitdata[i] == 'AAAA' then
> rrtype = splitdata[i]
> elseif splitdata[i] == 'NODATA' then
> -- pdnslog('No data available')
> break
> else
> pdnslog('Invalid rrtype provided')
> break
> end
> elseif ttl == '' then
> ttl = tonumber(splitdata[i])
> if ttl == nil then
> pdnslog('Invalid ttl provided')
> break
> end
> elseif ip == '' then
> ip = splitdata[i]
> local tmpdr = newDR(newDN('redirect'), rrtype, ttl, ip, pdns.place.ANSWER)
> if tmpdr == nil then
> pdnslog("Unable to create DR for "..rrtype..' '..ttl..' '..ip)
> break
> else
> table.insert(nxd_tmp, tmpdr)
> end
> rrtype = ''
> ttl = ''
> ip = ''
> else
> pdnslog('Should not be reachable')
> end
> end -- for i = 2, #splitdata
> else
> pdnslog("invalid response "..resp)
> end -- if splitdata['1'] == 'NXD'
> end -- if/else resp == nil
> nxd_data = nxd_tmp
> end
>
> -- make redirect processing its own function so we can call it from multiple places
> function nxd_handler(dq)
> for i = 1, #nxd_data do
> if dq.qtype == nxd_data[i].type or dq.qtype == pdns.ANY then
> dq.rcode = pdns.NOERROR -- make it a normal answer
> dq:addAnswer(nxd_data[i].type, nxd_data[i]:getContent(), nxd_data[i].ttl)
> end
> end
> if dq.rcode == pdns.NOERROR then
> nxd_redirectcount:inc()
> end
> return dq
> end
>
> -- If resolver returns nxdomain, redirect to searchguide service
> function nxdomain(dq)
>
> -- Only redirect for A/AAAA/ANY queries
> if dq.qtype == pdns.A or dq.qtype == pdns.AAAA or dq.qtype == pdns.ANY then
>
> -- skip redirect if host part matches the patterns in nxdomainhostskip
> for i, hpart in ipairs(nxdomainhostskip) do
> if string.match(dq.qname:toString(), "^"..hpart) then
> return false
> end
> end
>
> -- business customer opt-out
> if nxd_business_opt_out:match(dq.remoteaddr) then
> nxd_optoutcount:inc()
> return false
> end
>
> -- check if host part matches patterns. if so, redirect to search service
> for i, hpart in ipairs(nxdomainhostmatch) do
> if string.match(dq.qname:toString(), "^"..hpart) then
> dq = nxd_handler(dq)
> return true
> end
> end
>
> -- only redirect queries two or fewer levels deep
> if dq.qname:countLabels() < 3 then
> dq = nxd_handler(dq)
> return true
> end
>
> -- too long and didn't match host parts, so skip redirect
> return false
> end
> return false
> end
> --------------------
>
>
> Half-hour stats
> --------------------
> Apr 05 16:35:34 ny00vmp-rdns0.svc.tds.net pdns_recursor[1545]: stats: 8559042686 questions, 824993 cache entries, 102305 negative entries, 81% cache hits
> Apr 05 16:35:34 ny00vmp-rdns0.svc.tds.net pdns_recursor[1545]: stats: throttle map: 1480, ns speeds: 74541, failed ns: 3038, ednsmap: 198701
> Apr 05 16:35:34 ny00vmp-rdns0.svc.tds.net pdns_recursor[1545]: stats: outpacket/query ratio 28%, 11% throttled, 0 no-delegation drops
> Apr 05 16:35:34 ny00vmp-rdns0.svc.tds.net pdns_recursor[1545]: stats: 11031110 outgoing tcp connections, 42 queries running, 67363308 outgoing timeouts
> Apr 05 16:35:34 ny00vmp-rdns0.svc.tds.net pdns_recursor[1545]: stats: 508363 packet cache entries, 2% packet cache hits
> Apr 05 16:35:34 ny00vmp-rdns0.svc.tds.net pdns_recursor[1545]: stats: thread 0 has been distributed 1127234640 queries
> Apr 05 16:35:34 ny00vmp-rdns0.svc.tds.net pdns_recursor[1545]: stats: thread 1 has been distributed 1025150455 queries
> Apr 05 16:35:34 ny00vmp-rdns0.svc.tds.net pdns_recursor[1545]: stats: thread 2 has been distributed 1086409376 queries
> Apr 05 16:35:34 ny00vmp-rdns0.svc.tds.net pdns_recursor[1545]: stats: thread 3 has been distributed 956026740 queries
> Apr 05 16:35:34 ny00vmp-rdns0.svc.tds.net pdns_recursor[1545]: stats: thread 4 has been distributed 1117212754 queries
> Apr 05 16:35:34 ny00vmp-rdns0.svc.tds.net pdns_recursor[1545]: stats: thread 5 has been distributed 981577497 queries
> Apr 05 16:35:34 ny00vmp-rdns0.svc.tds.net pdns_recursor[1545]: stats: thread 6 has been distributed 878916852 queries
> Apr 05 16:35:34 ny00vmp-rdns0.svc.tds.net pdns_recursor[1545]: stats: thread 7 has been distributed 1378105489 queries
> Apr 05 16:35:34 ny00vmp-rdns0.svc.tds.net pdns_recursor[1545]: stats: 4290 qps (average over 1800 seconds)
> --------------------
>
>
> rec_control get-all
> --------------------
> all-outqueries 2426135931
> answers-slow 32622243
> answers0-1 6963714100
> answers1-10 616485007
> answers10-100 589467848
> answers100-1000 111976956
> auth-zone-queries 3913588
> auth4-answers-slow 54645139
> auth4-answers0-1 502461956
> auth4-answers1-10 778877222
> auth4-answers10-100 615451605
> auth4-answers100-1000 109890924
> auth6-answers-slow 9558349
> auth6-answers0-1 42211276
> auth6-answers1-10 195413588
> auth6-answers10-100 100814046
> auth6-answers100-1000 13552926
> cache-entries 825029
> cache-hits 6781159775
> cache-misses 1533106380
> case-mismatches 0
> chain-resends 174881587
> client-parse-errors 0
> concurrent-queries 75
> cpu-msec-thread-0 91219643
> cpu-msec-thread-1 142746545
> cpu-msec-thread-2 278497218
> cpu-msec-thread-3 250309239
> cpu-msec-thread-4 250701322
> cpu-msec-thread-5 319416792
> cpu-msec-thread-6 328608768
> cpu-msec-thread-7 264910279
> cpu-msec-thread-8 322031125
> cpu-msec-thread-9 335541569
> dlg-only-drops 0
> dnssec-authentic-data-queries 18646074
> dnssec-check-disabled-queries 1113241
> dnssec-queries 53214851
> dnssec-result-bogus 0
> dnssec-result-indeterminate 0
> dnssec-result-insecure 0
> dnssec-result-nta 0
> dnssec-result-secure 10986
> dnssec-validations 10986
> dont-outqueries 45917207
> ecs-queries 1451717
> ecs-responses 32451
> edns-ping-matches 0
> edns-ping-mismatches 0
> empty-queries 0
> failed-host-entries 1356
> fd-usage 269
> ignored-packets 1478626
> ipv6-outqueries 359162038
> ipv6-questions 8551805269
> malloc-bytes 0
> max-cache-entries 1000000
> max-mthread-stack 87984
> max-packetcache-entries 500000
> negcache-entries 102533
> no-packet-error 1601797538
> noedns-outqueries 3772686
> noerror-answers 7669982655
> noping-outqueries 0
> nsset-invalidations 2704139
> nsspeeds-entries 70484
> nxdomain-answers 744158682
> nxdomain-opt-out 2407297
> nxdomain-redirect 93573041
> outgoing-timeouts 67362296
> outgoing4-timeouts 56885531
> outgoing6-timeouts 10476765
> over-capacity-drops 13007
> packetcache-entries 510580
> packetcache-hits 244194152
> packetcache-misses 8307487740
> policy-drops 0
> policy-result-custom 0
> policy-result-drop 0
> policy-result-noaction 8314084804
> policy-result-nodata 0
> policy-result-nxdomain 0
> policy-result-truncate 0
> qa-latency 9496
> qname-min-fallback-success 406051209
> query-pipe-full-drops 0
> questions 8558845589
> real-memory-usage 4266360832
> rebalanced-queries 1692943695
> resource-limits 0
> security-status 1
> server-parse-errors 91767
> servfail-answers 144318969
> spoof-prevents 1
> sys-msec 539748918
> tcp-client-overflow 0
> tcp-clients 4
> tcp-outqueries 11030988
> tcp-questions 7040320
> throttle-entries 1356
> throttled-out 306121689
> throttled-outqueries 306121689
> too-old-drops 107804
> truncated-drops 0
> udp-in-errors 80342
> udp-noport-errors 711240
> udp-recvbuf-errors 0
> udp-sndbuf-errors 0
> unauthorized-tcp 0
> unauthorized-udp 0
> unexpected-packets 46639
> unreachables 206095
> uptime 6103754
> user-msec 2044729297
> variable-responses 18555
> x-our-latency 59
> x-ourtime-slow 2003931
> x-ourtime0-1 8265444257
> x-ourtime1-2 37568531
> x-ourtime16-32 33205
> x-ourtime2-4 7265716
> x-ourtime4-8 1653903
> x-ourtime8-16 296611
> --------------------
> _______________________________________________
> Pdns-users mailing list
> Pdns-users at mailman.powerdns.com
> https://mailman.powerdns.com/mailman/listinfo/pdns-users
More information about the Pdns-users
mailing list