[dnsdist] Tune DNSDIST for proper traffic diversion and caching for lower latency
Chandra
me at tgrthi.me
Fri Dec 10 13:27:56 UTC 2021
Hi,
I am trying to create an edge DNS for all my dns queries with a similar
setup as described in the picture:
https://drive.google.com/file/d/1s95aWn2g5X4AkWOESmxFBP-
p8fa9FImc/view?usp=sharing
dnsconfig.conf
=== cut ====
setLocal("0.0.0.0:53 <http://0.0.0.0:53>",{reusePort=true})
setWeightedBalancingFactor(1.1)
setMaxCachedTCPConnectionsPerDownstream(50)
setMaxTCPClientThreads(50)
setMaxTCPQueuedConnections(0)
-- Downstream server config
pc = newPacketCache(10000, {maxTTL=172800, maxNegativeTTL=6000,minTTL=0,
temporaryFailureTTL=6000, keepStaleData=true,staleTTL=86400,
dontAge=true})
getPool(""):setCache(pc)
setServerPolicy(wrandom)
-- Primary server
newServer({weight=100000,retries=2,address="192.168.178.100",
name="pi0ipv4",checkTCP=true,reconnectOnUp=true})
-- Failover servers
newServer({weight=1,address="1.1.1.1:853
<http://1.1.1.1:853>",name="cloud-flair-dot1", tls="openssl",
subjectName="cloudflare-dns.com <http://cloudflare-dns.com>",
validateCertificates=true})
name="cloud-flair-doh2", tls="openssl", subjectName="cloudflare-dns.com
<http://cloudflare-dns.com>", validateCertificates=true})
newServer({weight=1,address="1.0.0.1:853
<http://1.0.0.1:853>",name="cloud-flair-dot2", tls="openssl",
subjectName="cloudflare-dns.com <http://cloudflare-dns.com>",
validateCertificates=true})
=== cut ====
There are several issues I am trying to resolve:
1 - If the primary is down, and there is no stale cache, direct the dns
traffic to fallback servers do not cache responses, direct the traffic
backup to primary when primary is up
2 - Tune cache to reduce the latency
For #1: I didn't find a proper server policy to fit my needs but, it
doesn't seem to be a completely new thing to have. Currently the
weighted random policy does work to some extent. But there are some
queries which goto the fallback servers, for eg out of 30k queries at
least 50 of them goto the fallback servers, I do not want this. Is there
a way to achieve what I described in #1?
For #2: This is the most concerning issue for me at the moment, the
average latency is about 80 ms (10k packet average), where as my primary
server latency is much lower (~50ms) and most confusing part is the
packet cache stats:
Entries: 86/10000
Hits: 4894
Misses: 21543
Deferred inserts: 0
Deferred lookups: 0
Lookup Collisions: 0
Insert Collisions: 0
TTL Too Shorts: 0
I was under the impression that if there's a cache miss then the
downstream response will be cached. Testing my setup for a couple of
days, I have never seen my cache crossing 100. Why is the response not
being cached, where there's a miss. Here are the current extended
stats:
acl-drops 0 noncompliant-
responses 0
cache-hits 4898 outgoing-doh-query-
pipe-full 0
cache-misses 21620 proxy-protocol-invalid
0
cpu-iowait 9878 queries
26941
cpu-steal 0 rdqueries
26941
cpu-sys-msec 589145 real-memory-usage
100343808
cpu-user-msec 2644549 responses
21589
doh-query-pipe-full 0 rule-drop
0
doh-response-pipe-full 0 rule-nxdomain
0
downstream-send-errors 0 rule-refused
0
downstream-timeouts 31 rule-servfail
0
dyn-block-nmg-size 0 rule-truncated
0
dyn-blocked 423 security-status
1
empty-queries 0 self-answered
0
fd-usage 369 servfail-responses
63
frontend-noerror 25941 special-memory-usage
87216128
frontend-nxdomain 336 tcp-cross-protocol-
query-pipe-full 0
frontend-servfail 210 tcp-cross-protocol-
response-pipe-full 0
latency-avg100 53222.4 tcp-listen-overflows
31
latency-avg1000 59723.1 tcp-query-pipe-full
0
latency-avg10000 77151.4 trunc-failures
0
latency-avg1000000 2226.7 udp-in-csum-errors
0
latency-count 26487 udp-in-errors
188
latency-slow 290 udp-noport-errors
9946
latency-sum 2245700 udp-recvbuf-errors
0
latency0-1 4898 udp-sndbuf-errors
0
latency1-10 28 udp6-in-csum-errors
0
latency10-50 9836 udp6-in-errors
342
latency100-1000 3900 udp6-noport-errors
7
latency50-100 7532 udp6-recvbuf-errors
342
no-policy 0 udp6-sndbuf-errors
1
from what I see, there are a lot of udp errors. How to fix this? Also to
add: all my traffic is udp based, I am not accepting TCP traffic yet.
Any help would be much appreciated!
Thanks,
Chandra
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.powerdns.com/pipermail/dnsdist/attachments/20211210/d7fec61b/attachment.htm>
More information about the dnsdist
mailing list