[dnsdist] dnsdist Drops, revisited
Michael Van Der Beek
michael.van at antlabs.com
Fri Mar 6 04:42:16 UTC 2020
Hi Fredrik,
Have you noticed this setting on dnsdist.
setUDPTimeout(num)
Set the maximum time dnsdist will wait for a response from a backend over UDP, in seconds. Defaults to 2
I'm not sure if timeouts are classified as drops. My guess probably, because it didn't get a response in time.
Since your backend is a recursor. There are times that the recursor cannot reach or encounters a non-responsive authoritative server. Unbound has an exponential backoff when querying such servers. I think it starts with 10s.
https://nlnetlabs.nl/documentation/unbound/info-timeout/
I would suggest you set the dnsdist setUDPTImeout(10), frankly, if Unbound cannot respond to you in < 10 seconds, most likely the target authoritative server is not responding.
As to why one server has more drops then others..
Assuming both servers have approximately the same number of queries/s
So if the two servers have the same config (for unbound) and hardware.
Note if the two servers are going via different ISPs then, their relative network speed can cause difference in response times.
Then I would suggest, look at the some of these settings to see if they are the same.
Note these are centos 7 settings. I'm not sure what the Debian equivalents are.
net.core.rmem_default
net.core.wmem_default
net.core.rmem_max
net.core.wmem_max
net.netfilter.nf_conntrack_udp_timeout
Also generally, turn off connection tracking for udp/tcp packets via your firewall rules.
https://kb.isc.org/docs/aa-01183
Regards,
Michael
-----Original Message-----
From: dnsdist <dnsdist-bounces at mailman.powerdns.com> On Behalf Of Fredrik Pettai via dnsdist
Sent: Thursday, March 5, 2020 6:14 PM
To: dnsdist at mailman.powerdns.com
Subject: [dnsdist] dnsdist Drops, revisited
Hi list,
I’m curious on the “high" amount of Drops I see on one dnsdist 1.4.0 (debian derived packages) frontend compared to other(s) And I’m guessing the main reason is workload, which is different (services/servers use this resolver that Drops more).
I don’t find the “high” Drops numbers satisfying, but perhaps these numbers are about normal average?
Anyway, I'd would like to improve those numbers if possible. Here are some stats from two dnsdist frontends:
> showServers()
# Name Address State Qps Qlim Ord Wt Queries Drops Drate Lat Outstanding Pools
0 worker1 127.0.0.1:53 up 73.7 0 1 1 565950 278 0.0 0.5 0
1 worker2 [::1]:53 up 55.7 0 1 1 584273 294 0.0 1.1 0
While one of our bigger servers doesn’t perform as well (in terms of Drops ratio):
> showServers()
# Name Address State Qps Qlim Ord Wt Queries Drops Drate Lat Outstanding Pools
0 worker1 127.0.0.1:53 up 43.8 0 1 1 1054047 12728 0.0 31.1 4
1 worker2 127.0.0.1:53 up 43.8 0 1 1 1064823 12823 0.0 17.5 4
2 worker3 [::1]:53 up 20.9 0 1 1 1054548 12773 0.0 38.5 2
3 worker4 [::1]:53 up 35.8 0 1 1 1081502 12854 0.0 48.9 3
FW & DNSdist rules are almost none, and the same configuration on both the above systems (actually more active rules and even Lua-code on the “fast” dnsdist-system)
I just found one earlier thread on the topic, and it didn’t describe a way to improve the situation, just how to possibly look to see what the underlying issues might be...
http://powerdns.13854.n7.nabble.com/dnsdist-drops-packet-td11974.html
(https://mailman.powerdns.com/pipermail/dnsdist/2016-January/000052.html)
dumpStats from the above server
> dumpStats()
acl-drops 0 latency0-1 3620405
cache-hits 0 latency1-10 59808
cache-misses 0 latency10-50 132513
cpu-sys-msec 749565 latency100-1000 386909
cpu-user-msec 470696 latency50-100 101861
downstream-send-errors 0 no-policy 0
downstream-timeouts 52571 noncompliant-queries 0
dyn-block-nmg-size 0 noncompliant-responses 0
dyn-blocked 0 queries 4382032
empty-queries 0 rdqueries 4382007
fd-usage 42 real-memory-usage 315129856
frontend-noerror 3254422 responses 4329454
frontend-nxdomain 902996 rule-drop 0
frontend-servfail 172012 rule-nxdomain 0
latency-avg100 41936.3 rule-refused 0
latency-avg1000 44165.7 rule-servfail 0
latency-avg10000 43366.6 security-status 0
latency-avg1000000 41994.4 self-answered 1
latency-count 4329455 servfail-responses 172012
latency-slow 27681 special-memory-usage 95940608
latency-sum 172860695 trunc-failures 0
> topSlow(10, 1000)
1 uyrg.com. 69 46.9%
2 115.61.96.156.in-addr.arpa. 19 12.9%
3 nhu.edu.tw. 9 6.1%
4 nbkailan.com. 8 5.4%
5 aikesi.com. 8 5.4%
6 168.122.238.45.in-addr.arpa. 6 4.1%
7 45-179-252-62-dynamic.proxyar.com. 4 2.7%
8 callforarticle.com. 3 2.0%
9 default._domainkey.nhu.edu.tw. 3 2.0%
10 205.78.127.180.in-addr.arpa. 3 2.0%
11 Rest 15 10.2%
(Many are probably spammy relay IPs, sending domains, etc)
Is there a way to optimise the dnsdist configuration, for instance making a slow path?
either for the slow queries, or possibly the clients that ask those queries?
(Also, It’s unbound in the backend of all dnsdist frontend, and it’s caching heavily, also expired answers).
Re,
/P
_______________________________________________
dnsdist mailing list
dnsdist at mailman.powerdns.com
https://mailman.powerdns.com/mailman/listinfo/dnsdist
More information about the dnsdist
mailing list