[Pdns-users] pdns-recursor metrics review and tuning advice request
    Scott Crace 
    mscottcrace at gmail.com
       
    Fri Apr 18 23:04:18 UTC 2025
    
    
  
Otto,
Thanks for your assistance.Since these were setup with private IPs I wasn't
sure how useful the config would be however, I have included it below.
# rec_control dump-throttlemap -
; throttle map dump follows
; remote IP     qname   qtype   count   ttd     reason
10.0.196.197    0.10.in-addr.arpa       A       2       2025-04-18T18:44:22
    RCodeRefused
10.0.196.197    10.10.in-addr.arpa      A       3       2025-04-18T18:44:25
    RCodeRefused
10.0.196.197    255.10.in-addr.arpa     A       1       2025-04-18T18:44:23
    RCodeRefused
10.0.62.244     0.10.in-addr.arpa       A       2       2025-04-18T18:44:22
    RCodeRefused
10.0.62.244     10.10.in-addr.arpa      A       3       2025-04-18T18:44:25
    RCodeRefused
10.0.62.244     255.10.in-addr.arpa     A       2       2025-04-18T18:44:23
    RCodeRefused
dump-throttlemap: dumped 6 records
# rec_control dump-failedservers -
I removed any count 1 or 2 for brevity since this email is already a long
read.
; failed servers dump follows
; remote IP     count   timestamp
203.119.25.5    8       2025-04-18T18:43:44
203.119.26.5    8       2025-04-18T18:43:42
203.119.27.5    8       2025-04-18T18:43:41
203.119.28.5    8       2025-04-18T18:43:39
203.119.29.5    8       2025-04-18T18:43:45
200.189.41.10   7       2025-04-18T18:42:46
200.219.148.10  6       2025-04-18T18:39:47
200.219.154.10  6       2025-04-18T18:42:43
200.219.159.10  7       2025-04-18T18:42:45
200.192.233.10  7       2025-04-18T18:42:40
200.229.248.10  4       2025-04-18T18:42:42
203.119.95.53   3       2025-04-18T18:39:30
203.119.86.101  1229    2025-04-18T18:40:03
35.173.255.124  4895    2025-04-18T18:36:21
dump-failedservers: dumped 43 records
Config(s)
Please note that one of the zones forwarding is 'split brained' from a
legacy setup. The zone consists of a private Active Directory environment
and a separately maintained public zone. The configuration forwards to the
private AD servers and I believe the lua script drops queries that have no
match in that zone. The public zone is being slowly phased out.
I noted while reviewing the previous server configs and found a comment
about this value but no context for the specific reasoning. This may
explain the values you noted but I would like to understand the
implications of removing it. It doesn't seem like something that should
have been enabled.
# https://github.com/PowerDNS/pdns/issues/6186
max-negative-ttl=0
 /etc/pdns-recursor/recursor.conf
---
dnssec:
  validation: validate
incoming:
  allow_from:
    - 127.0.0.1/8
    - 10.0.0.0/8
    - 172.16.0.0/12
    - 192.168.0.0/16
    - 'fd00::/8'
    - '2607:B600::/32'
  listen:
    - 0.0.0.0
  max_tcp_clients: 128
  max_tcp_per_client: 0
  max_tcp_queries_per_connection: 0
  port: 53
  tcp_timeout: 2
outgoing:
  dont_query: []
  max_qperq: 50
  network_timeout: 1500
packetcache:
  max_entries: 1000000
recordcache:
  max_entries: 1000000
  max_negative_ttl: 0
  max_ttl: 86400
recursor:
  daemon: false
  forward_zones:
    - zone: momentumbusiness.com
      recurse: false
      forwarders:
        - 10.255.255.76
        - 10.1.3.228
    - zone: 10.in-addr.arpa
      recurse: false
      forwarders:
        - 10.0.196.197
        - 10.0.62.244
    - zone: 168.192.in-addr.arpa
      recurse: false
      forwarders:
        - 10.0.196.197
        - 10.0.62.244
    - zone: 16.172.in-addr.arpa
      recurse: false
      forwarders:
        - 10.0.196.197
        - 10.0.62.244
  lua_dns_script: /etc/pdns-recursor/momentumbusiness_com.lua
  max_recursion_depth: 40
  max_total_msec: 7000
  minimum_ttl_override: 1
  server_id: nsres01.momentumtelecom.com
  setgid: pdns-recursor
  setuid: pdns-recursor
webservice:
  address: 0.0.0.0
  allow_from:
    - 192.168.9.164
    - 192.168.21.134
    - 192.168.20.0/24
  api_key: <sanitized>
  port: 8080
  webserver: true
logging:
  loglevel: 3
...
/etc/pdns-recursor/momentumbusiness_com.lua
pdnslog("Lua NXDomain filter for momentumbusiness.com loading...",
pdns.loglevels.Notice)
nxdomainsuffix=newDN("momentumbusiness.com")
function nxdomain(dq)
    if dq.qname:isPartOf(nxdomainsuffix)
    then
      dq.appliedPolicy.policyKind = pdns.policykinds.Drop
      return true
    end
      return false
end
On Fri, Apr 18, 2025 at 9:39 AM Otto Moerbeek <otto at drijf.net> wrote:
> On Fri, Apr 18, 2025 at 08:28:48AM -0400, Scott Crace via Pdns-users wrote:
>
> Hi,
>
> Please include your config. That said:
>
> You seem to have pretty low cache hit ratio, a high number of outgoing
> queries. How is your cache configged?
>
> Also some throttling is going on. I suspect rec has trouble contacting
> one or more auths or forwarders. The throttling tables can be viewed
> using
>
>         rec_control dump-throttlemap -
>         rec_control dump-failedservers -
>
> Also, what happens *during* the trace can be very relevant. If one
> auth (or forwarder) does not respond, rec will turn to another one,
> but only after the timeout of 1500ms by default.
>
>         -Otto
>
> >  Hello all,
> >  Long time lurker on the message list and would like some performance
> > and/or tuning advice.
> > We've been using pdns-recursor as internal recursive nameservers for
> quite
> > some time now.
> > The original implementer of pdns departed and I was recently tasked with
> > replacing or upgrading all of the servers with newer RHEL9 versions. I
> > opted to build fresh and migrate the configuration to the latest 5.2
> > release.
> >
> > I'm hearing occasional complaints about odd issues and/or clients cycling
> > through their DNS servers rapidly (timeouts?). Manual testing DNS works
> but
> > I am reading through the metrics and performance documentation. I am
> hoping
> > someone with a more experienced eye could take a look at a sampling of
> the
> > periodic statistics report (below) and provide some insight or
> > prioritization on any urgent issues I should focus on studying first.
> >
> > My observations:
> > * I do note that the performance documentation talks about
> > firewalld/stateful firewalls impact but the legacy servers were using the
> > same basic setup. If the firewall is the problem is there a way to
> validate
> > this (other than stopping firewalld and waiting)?
> > * The "worker" threads seem evenly distributed to my novice eye and our
> qps
> > (queries per second) rate is low as I would expect since the name servers
> > are internal only resources.
> > * I ran a few pcaps and rec_control trace-regex for specific domain items
> > being reported as problematic. Everything seemed to be working with the
> > trace-regex always showing "Step3 Final resolve: No Error/6 or 8".
> >
> > Thank you in advance for your time and consideration.
> >
> > Sincerely,
> > Scotsie
> >
> > ```
> > Apr 17 16:07:28 nsrecdns01-1 pdns-recursor[1092]: msg="Periodic
> statistics
> > report" subsystem="stats" level="0" prio="Info" tid="0"
> ts="1744920448.170"
> > cache-entries="23666" negcache-entries="497" questions="6831695"
> > record-cache-acquired="286931329" record-cache-contended="64414"
> > record-cache-contended-perc="0.02" record-cache-hitratio-perc="0.87"
> > Apr 17 16:07:28 nsrecdns01-1 pdns-recursor[1092]: msg="Periodic
> statistics
> > report" subsystem="stats" level="0" prio="Info" tid="0"
> ts="1744920448.170"
> > packetcache-acquired="16887684" packetcache-contended="1019"
> > packetcache-contended-perc="0.01" packetcache-entries="7112"
> > packetcache-hitratio-perc="37.75"
> > Apr 17 16:07:28 nsrecdns01-1 pdns-recursor[1092]: msg="Periodic
> statistics
> > report" subsystem="stats" level="0" prio="Info" tid="0"
> ts="1744920448.170"
> > edns-entries="38" failed-host-entries="50"
> > non-resolving-nameserver-entries="0" nsspeed-entries="968"
> > saved-parent-ns-sets-entries="65" throttle-entries="8"
> > Apr 17 16:07:28 nsrecdns01-1 pdns-recursor[1092]: msg="Periodic
> statistics
> > report" subsystem="stats" level="0" prio="Info" tid="0"
> ts="1744920448.170"
> > concurrent-queries="1" dot-outqueries="0" idle-tcpout-connections="0"
> > outgoing-timeouts="36594" outqueries="14668546"
> > outqueries-per-query-perc="214.71" tcp-outqueries="3131"
> > throttled-queries-perc="1.90"
> > Apr 17 16:07:28 nsrecdns01-1 pdns-recursor[1092]: msg="Periodic
> statistics
> > report" subsystem="stats" level="0" prio="Info" tid="0"
> ts="1744920448.170"
> > taskqueue-expired="0" taskqueue-pushed="540" taskqueue-size="0"
> > Apr 17 16:07:28 nsrecdns01-1 pdns-recursor[1092]: msg="Queries handled by
> > thread" subsystem="stats" level="0" prio="Info" tid="0"
> ts="1744920448.170"
> > count="3470098" thread="0" tname="worker"
> > Apr 17 16:07:28 nsrecdns01-1 pdns-recursor[1092]: msg="Queries handled by
> > thread" subsystem="stats" level="0" prio="Info" tid="0"
> ts="1744920448.170"
> > count="3360836" thread="1" tname="worker"
> > Apr 17 16:07:28 nsrecdns01-1 pdns-recursor[1092]: msg="Queries handled by
> > thread" subsystem="stats" level="0" prio="Info" tid="0"
> ts="1744920448.171"
> > count="764" thread="2" tname="tcpworker"
> > Apr 17 16:07:28 nsrecdns01-1 pdns-recursor[1092]: msg="Periodic QPS
> report"
> > subsystem="stats" level="0" prio="Info" tid="0" ts="1744920448.171"
> > averagedOver="1800" qps="117"
> > ```
>
> > _______________________________________________
> > Pdns-users mailing list
> > Pdns-users at mailman.powerdns.com
> > https://mailman.powerdns.com/mailman/listinfo/pdns-users
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.powerdns.com/pipermail/pdns-users/attachments/20250418/4ddfb71f/attachment-0001.htm>
    
    
More information about the Pdns-users
mailing list