[Pdns-users] PowerDNS issues

Andrey Sedletsky asedletsky at spd-mgts.ru
Wed Sep 22 13:03:34 UTC 2021


Good afternoon!
After restarting the pdns-recursor process, the number of "outgoing 
query timeout" and "over capacity drops" sharply increases, which leads 
to serious degradation of the service.
This behavior manifests itself at times of high load on the server (more 
than 400 thousand requests per second). With a lower load, restarting 
the process does not lead to such consequences.
Below are the examples:

Before the restart (data from the telegraf + influxdb bundle)
 > select "host","outgoing-timeouts" from powerdns_recursor where 
"host"='a975-icache01' and time > '2021-09-03 07:45:00' and time < 
'2021-09-03 08:00:00'
name: powerdns_recursor
time host outgoing-timeouts
---- ---- -----------------

2021-09-03T07:50:30Z a975-icache01 1463346871
2021-09-03T07:51:02Z a975-icache01 1463354005
2021-09-03T07:51:31 Za 975-icache01 1463360230
2021-09-03T07:52:00Z a975-icache01 1463366325
2021-09-03T07:52:30Z a975-icache01 1463372284

 > select "host","over-capacity-drops" from powerdns_recursor where 
"host"='a975-icache01' and time > '2021-09-03 07:45:00' and time < 
'2021-09-03 08:00:00'
name: powerdns_recursor
time                 host          over-capacity-drops
----                 ----          -------------------
2021-09-03T07:50:30Z a975-icache01 5281536
2021-09-03T07:51:02Z a975-icache01 5281536
2021-09-03T07:51:31Z a975-icache01 5281536
2021-09-03T07:52:00Z a975-icache01 5281536
2021-09-03T07:52:30Z a975-icache01 5281536


And after the restart:

select "host","outgoing-timeouts" from powerdns_recursor where 
"host"='a975-icache01' and time > '2021-09-03 07:45:00' and time < 
'2021-09-03 08:00:00'
name: powerdns_recursor
time                 host          outgoing-timeouts
----                 ----          -----------------
2021-09-03T07:53:30Z a975-icache01 114684
2021-09-03T07:54:01Z a975-icache01 437493
2021-09-03T07:54:31Z a975-icache01 738150
2021-09-03T07:55:03Z a975-icache01 1060959
2021-09-03T07:55:30Z a975-icache01 1327177
...


 > select "host","over-capacity-drops" from powerdns_recursor where 
"host"='a975-icache01' and time > '2021-09-03 07:45:00' and time < 
'2021-09-03 08:00:00'
name: powerdns_recursor
time                 host          over-capacity-drops
----                 ----          -------------------
2021-09-03T07:53:30Z a975-icache01 100934
2021-09-03T07:54:01Z a975-icache01 457612
2021-09-03T07:54:31Z a975-icache01 572332
2021-09-03T07:55:03Z a975-icache01 742152
2021-09-03T07:55:30Z a975-icache01 803205
...

We are interested in what could be the reason for this behavior

Thank you in advance


Additional information:
 >rec_control version
4.3.6
 > less /etc/oracle-release
Oracle Linux Server release 8.4
 >2 CPUs (28 cores, 56 threads)
 >128 GB RAM
PDNS was installed from EPEL Repo
grep -i process recursor.conf
# dnssec        DNSSEC mode: off/process-no-validate 
(default)/process/log-fail/validate
# dnssec=process-no-validate


Best Regards,
Andrey


More information about the Pdns-users mailing list