[Pdns-users] pdns-recursor metrics review and tuning advice request

Scott Crace mscottcrace at gmail.com
Fri Apr 18 23:04:18 UTC 2025


Otto,
Thanks for your assistance.Since these were setup with private IPs I wasn't
sure how useful the config would be however, I have included it below.

# rec_control dump-throttlemap -
; throttle map dump follows
; remote IP     qname   qtype   count   ttd     reason
10.0.196.197    0.10.in-addr.arpa       A       2       2025-04-18T18:44:22
    RCodeRefused
10.0.196.197    10.10.in-addr.arpa      A       3       2025-04-18T18:44:25
    RCodeRefused
10.0.196.197    255.10.in-addr.arpa     A       1       2025-04-18T18:44:23
    RCodeRefused
10.0.62.244     0.10.in-addr.arpa       A       2       2025-04-18T18:44:22
    RCodeRefused
10.0.62.244     10.10.in-addr.arpa      A       3       2025-04-18T18:44:25
    RCodeRefused
10.0.62.244     255.10.in-addr.arpa     A       2       2025-04-18T18:44:23
    RCodeRefused
dump-throttlemap: dumped 6 records

# rec_control dump-failedservers -
I removed any count 1 or 2 for brevity since this email is already a long
read.
; failed servers dump follows
; remote IP     count   timestamp
203.119.25.5    8       2025-04-18T18:43:44
203.119.26.5    8       2025-04-18T18:43:42
203.119.27.5    8       2025-04-18T18:43:41
203.119.28.5    8       2025-04-18T18:43:39
203.119.29.5    8       2025-04-18T18:43:45
200.189.41.10   7       2025-04-18T18:42:46
200.219.148.10  6       2025-04-18T18:39:47
200.219.154.10  6       2025-04-18T18:42:43
200.219.159.10  7       2025-04-18T18:42:45
200.192.233.10  7       2025-04-18T18:42:40
200.229.248.10  4       2025-04-18T18:42:42
203.119.95.53   3       2025-04-18T18:39:30
203.119.86.101  1229    2025-04-18T18:40:03
35.173.255.124  4895    2025-04-18T18:36:21
dump-failedservers: dumped 43 records


Config(s)

Please note that one of the zones forwarding is 'split brained' from a
legacy setup. The zone consists of a private Active Directory environment
and a separately maintained public zone. The configuration forwards to the
private AD servers and I believe the lua script drops queries that have no
match in that zone. The public zone is being slowly phased out.

I noted while reviewing the previous server configs and found a comment
about this value but no context for the specific reasoning. This may
explain the values you noted but I would like to understand the
implications of removing it. It doesn't seem like something that should
have been enabled.
# https://github.com/PowerDNS/pdns/issues/6186
max-negative-ttl=0

 /etc/pdns-recursor/recursor.conf

---

dnssec:

  validation: validate

incoming:

  allow_from:

    - 127.0.0.1/8

    - 10.0.0.0/8

    - 172.16.0.0/12

    - 192.168.0.0/16

    - 'fd00::/8'

    - '2607:B600::/32'

  listen:

    - 0.0.0.0

  max_tcp_clients: 128

  max_tcp_per_client: 0

  max_tcp_queries_per_connection: 0

  port: 53

  tcp_timeout: 2

outgoing:

  dont_query: []

  max_qperq: 50

  network_timeout: 1500

packetcache:

  max_entries: 1000000

recordcache:

  max_entries: 1000000

  max_negative_ttl: 0

  max_ttl: 86400

recursor:

  daemon: false

  forward_zones:

    - zone: momentumbusiness.com

      recurse: false

      forwarders:

        - 10.255.255.76

        - 10.1.3.228

    - zone: 10.in-addr.arpa

      recurse: false

      forwarders:

        - 10.0.196.197

        - 10.0.62.244

    - zone: 168.192.in-addr.arpa

      recurse: false

      forwarders:

        - 10.0.196.197

        - 10.0.62.244

    - zone: 16.172.in-addr.arpa

      recurse: false

      forwarders:

        - 10.0.196.197

        - 10.0.62.244

  lua_dns_script: /etc/pdns-recursor/momentumbusiness_com.lua

  max_recursion_depth: 40

  max_total_msec: 7000

  minimum_ttl_override: 1

  server_id: nsres01.momentumtelecom.com

  setgid: pdns-recursor

  setuid: pdns-recursor

webservice:

  address: 0.0.0.0

  allow_from:

    - 192.168.9.164

    - 192.168.21.134

    - 192.168.20.0/24

  api_key: <sanitized>

  port: 8080

  webserver: true

logging:

  loglevel: 3

...

/etc/pdns-recursor/momentumbusiness_com.lua
pdnslog("Lua NXDomain filter for momentumbusiness.com loading...",
pdns.loglevels.Notice)
nxdomainsuffix=newDN("momentumbusiness.com")
function nxdomain(dq)
    if dq.qname:isPartOf(nxdomainsuffix)
    then
      dq.appliedPolicy.policyKind = pdns.policykinds.Drop
      return true
    end
      return false
end

On Fri, Apr 18, 2025 at 9:39 AM Otto Moerbeek <otto at drijf.net> wrote:

> On Fri, Apr 18, 2025 at 08:28:48AM -0400, Scott Crace via Pdns-users wrote:
>
> Hi,
>
> Please include your config. That said:
>
> You seem to have pretty low cache hit ratio, a high number of outgoing
> queries. How is your cache configged?
>
> Also some throttling is going on. I suspect rec has trouble contacting
> one or more auths or forwarders. The throttling tables can be viewed
> using
>
>         rec_control dump-throttlemap -
>         rec_control dump-failedservers -
>
> Also, what happens *during* the trace can be very relevant. If one
> auth (or forwarder) does not respond, rec will turn to another one,
> but only after the timeout of 1500ms by default.
>
>         -Otto
>
> >  Hello all,
> >  Long time lurker on the message list and would like some performance
> > and/or tuning advice.
> > We've been using pdns-recursor as internal recursive nameservers for
> quite
> > some time now.
> > The original implementer of pdns departed and I was recently tasked with
> > replacing or upgrading all of the servers with newer RHEL9 versions. I
> > opted to build fresh and migrate the configuration to the latest 5.2
> > release.
> >
> > I'm hearing occasional complaints about odd issues and/or clients cycling
> > through their DNS servers rapidly (timeouts?). Manual testing DNS works
> but
> > I am reading through the metrics and performance documentation. I am
> hoping
> > someone with a more experienced eye could take a look at a sampling of
> the
> > periodic statistics report (below) and provide some insight or
> > prioritization on any urgent issues I should focus on studying first.
> >
> > My observations:
> > * I do note that the performance documentation talks about
> > firewalld/stateful firewalls impact but the legacy servers were using the
> > same basic setup. If the firewall is the problem is there a way to
> validate
> > this (other than stopping firewalld and waiting)?
> > * The "worker" threads seem evenly distributed to my novice eye and our
> qps
> > (queries per second) rate is low as I would expect since the name servers
> > are internal only resources.
> > * I ran a few pcaps and rec_control trace-regex for specific domain items
> > being reported as problematic. Everything seemed to be working with the
> > trace-regex always showing "Step3 Final resolve: No Error/6 or 8".
> >
> > Thank you in advance for your time and consideration.
> >
> > Sincerely,
> > Scotsie
> >
> > ```
> > Apr 17 16:07:28 nsrecdns01-1 pdns-recursor[1092]: msg="Periodic
> statistics
> > report" subsystem="stats" level="0" prio="Info" tid="0"
> ts="1744920448.170"
> > cache-entries="23666" negcache-entries="497" questions="6831695"
> > record-cache-acquired="286931329" record-cache-contended="64414"
> > record-cache-contended-perc="0.02" record-cache-hitratio-perc="0.87"
> > Apr 17 16:07:28 nsrecdns01-1 pdns-recursor[1092]: msg="Periodic
> statistics
> > report" subsystem="stats" level="0" prio="Info" tid="0"
> ts="1744920448.170"
> > packetcache-acquired="16887684" packetcache-contended="1019"
> > packetcache-contended-perc="0.01" packetcache-entries="7112"
> > packetcache-hitratio-perc="37.75"
> > Apr 17 16:07:28 nsrecdns01-1 pdns-recursor[1092]: msg="Periodic
> statistics
> > report" subsystem="stats" level="0" prio="Info" tid="0"
> ts="1744920448.170"
> > edns-entries="38" failed-host-entries="50"
> > non-resolving-nameserver-entries="0" nsspeed-entries="968"
> > saved-parent-ns-sets-entries="65" throttle-entries="8"
> > Apr 17 16:07:28 nsrecdns01-1 pdns-recursor[1092]: msg="Periodic
> statistics
> > report" subsystem="stats" level="0" prio="Info" tid="0"
> ts="1744920448.170"
> > concurrent-queries="1" dot-outqueries="0" idle-tcpout-connections="0"
> > outgoing-timeouts="36594" outqueries="14668546"
> > outqueries-per-query-perc="214.71" tcp-outqueries="3131"
> > throttled-queries-perc="1.90"
> > Apr 17 16:07:28 nsrecdns01-1 pdns-recursor[1092]: msg="Periodic
> statistics
> > report" subsystem="stats" level="0" prio="Info" tid="0"
> ts="1744920448.170"
> > taskqueue-expired="0" taskqueue-pushed="540" taskqueue-size="0"
> > Apr 17 16:07:28 nsrecdns01-1 pdns-recursor[1092]: msg="Queries handled by
> > thread" subsystem="stats" level="0" prio="Info" tid="0"
> ts="1744920448.170"
> > count="3470098" thread="0" tname="worker"
> > Apr 17 16:07:28 nsrecdns01-1 pdns-recursor[1092]: msg="Queries handled by
> > thread" subsystem="stats" level="0" prio="Info" tid="0"
> ts="1744920448.170"
> > count="3360836" thread="1" tname="worker"
> > Apr 17 16:07:28 nsrecdns01-1 pdns-recursor[1092]: msg="Queries handled by
> > thread" subsystem="stats" level="0" prio="Info" tid="0"
> ts="1744920448.171"
> > count="764" thread="2" tname="tcpworker"
> > Apr 17 16:07:28 nsrecdns01-1 pdns-recursor[1092]: msg="Periodic QPS
> report"
> > subsystem="stats" level="0" prio="Info" tid="0" ts="1744920448.171"
> > averagedOver="1800" qps="117"
> > ```
>
> > _______________________________________________
> > Pdns-users mailing list
> > Pdns-users at mailman.powerdns.com
> > https://mailman.powerdns.com/mailman/listinfo/pdns-users
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.powerdns.com/pipermail/pdns-users/attachments/20250418/4ddfb71f/attachment-0001.htm>


More information about the Pdns-users mailing list