[Pdns-users] Reg. PDNS recursor Ver 4.1.16
Brian Candler
b.candler at pobox.com
Wed Dec 9 08:48:11 UTC 2020
On 09/12/2020 07:30, Kiran Kumar via Pdns-users wrote:
> How do we minimize answers-slow, We are running on CentOS Linux
> release 7.9.2009 (Core)
> on VM with 4VCPUs and 16GB RAM.
>
> rec_control get-all | grep answer
> *answers-slow 80903*
> answers0-1 598471
> answers1-10 1057756
> answers10-100 2342082
> answers100-1000 1341675
For explanation see:
https://docs.powerdns.com/recursor/metrics.html#gathered-information
answers-slow is queries answered after more than 1 second, and in your
case represent 1.5% of answers, except you've not shown packetcache-hits
so the fraction of client queries affected will likely be far less than
that.
In resolving a given query, the recursor is going to have to contact one
or more authoritative nameservers on the Internet. These are some
reasons why it might take more than 1 second to get the final answer:
- the answer is not already in cache (obviously) - this happens more
frequently if there is low TTL in the authoritative server for that
domain; AND
- the first authoritative server tried is down (or transient network
problem to that server), so pdns times out and tries another one; OR
- multiple authoritative servers need to be contacted, with a large
round-trip time to each; OR
- the client is querying for a domain which is completely lame / broken
and cannot find any answer.
This doesn't necessarily indicate a problem with your own pdns server at
all. It could just as well be problems with some authoritative domains
on the Internet. Heaven knows there are plenty of broken domains out
there :-)
It could however be made worse by packet loss or congestion on your
network or your network's upstream link. If your recursor is on a
private IP address behind a NAT, it would be better to put it on a
public IP address, so that it doesn't have to generate NAT state for
every outbound query it makes. If your uplink is congested, which will
cause latency and packet loss, then there's not much you can do short of
buying more bandwidth.
It could be made worse by excessive load on your server causing it to
fall behind or drop queries, or insufficient RAM causing it to kick out
cache entries prematurely, so you should also use a suitable tool to
monitor your server resource utilisation (netdata
<https://github.com/netdata/netdata> is very good for this, monitoring
at 1-second resolution by default so lets you see short bursts of
activity). However, your server may be completely fine.
For comparison, here's the tiny cache on my home network:
root at cache1:~# rec_control get-all | egrep
'^(answers|packetcache-hits|over-capacity-drops|policy-drops)'
answers-slow 348
answers0-1 6118
answers1-10 7149
answers10-100 9074
answers100-1000 4695
over-capacity-drops 0
packetcache-hits 1983665
policy-drops 0
and here's a production DNS cache in a data centre:
root at wrn-dns1:~# rec_control get-all | egrep
'^(answers|packetcache-hits|over-capacity-drops|policy-drops)'
answers-slow 1710185
answers0-1 40045388
answers1-10 132638392
answers10-100 101328465
answers100-1000 11033827
over-capacity-drops 0
packetcache-hits 8907014600
policy-drops 0
The fraction of answers-slow out of answersXXXX is not hugely different
from what you see. Also notice that packetcache-hits is far higher again.
Regards,
Brian.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.powerdns.com/pipermail/pdns-users/attachments/20201209/2ae77318/attachment-0001.htm>
More information about the Pdns-users
mailing list