[dnsdist] dnsdist[29321]: Marking downstream IP:53 as 'down'

Klaus Darilion klaus.darilion at nic.at
Thu Mar 24 11:13:23 UTC 2022


Indeed that might be a problem. We use (ferm syntax):
table raw {
    # Wir wollen NOTRACK fuer eingehende DNS Anfragen und die dazugehoerigen
    # ausgehenden Antworten. Ausgehende DNS Anfragen sollen weiter getrackt
    # werden damit die dazugehoerige Antwort rein darf.
    chain PREROUTING {
        proto (udp tcp) dport 53 NOTRACK;
    }
    chain OUTPUT {
        proto (udp tcp) sport 53 NOTRACK;
    }
}
Same for IPv4 and IPv6 in our case.

regards
Klaus





Von: dnsdist <dnsdist-bounces at mailman.powerdns.com> Im Auftrag von Rasto Rickardt via dnsdist
Gesendet: Donnerstag, 24. März 2022 11:36
An: dnsdist at mailman.powerdns.com
Betreff: Re: [dnsdist] dnsdist[29321]: Marking downstream IP:53 as 'down'

Hello Rais,
i noticed that you are increasing nf_conntrack_max. I am not sure how the backend servers are connected,
but i suggest not to use connection tracking/NAT at all. You can use for example dedicated interface for backend
management and other one to connect to dnsdist.
r.
On 24/03/2022 11:11, Rais Ahmed via dnsdist wrote:
Hi,

Thanks for the guidance...!

We are testing with multiple scenarios, with/without kernel tuning. We observed UDP packets errors on both backend servers (not a single UDP error on dnsdist LB server).

Tested with resperf 15K QPS
resperf -s 192.168.0.1 -R -d queryfile-example-10million-201202 -C 100 -c 300 -r 0 -m 15000 -q 200000

Backend 1: 192.168.1.1 (without Kernel tuning):
netstat -su
IcmpMsg:
    InType3: 2229
    InType8: 6
    InType11: 194
    OutType0: 6
    OutType3: 762
Udp:
    1634847 packets received
    843 packets to unknown port received.
    193891 packet receive errors
    1859642 packets sent
    193891 receive buffer errors
    0 send buffer errors
UdpLite:
IpExt:
    InOctets: 580762744
    OutOctets: 237368675
    InNoECTPkts: 1995692
    InECT0Pkts: 27

Backend 2: 192.168.1.2 (with Kernel Tuning):
netstat -su
IcmpMsg:
    InType3: 19177
    InType8: 5802
    InType11: 2645
    OutType0: 5802
    OutType3: 5122
Udp:
    10798358 packets received
    6846 packets to unknown port received.
    4815377 packet receive errors
    11949871 packets sent
    4815377 receive buffer errors
    0 send buffer errors
UdpLite:
IpExt:
    InNoRoutes: 11
    InOctets: 3312682950
    OutOctets: 1741771756
    InNoECTPkts: 16355120
    InECT1Pkts: 72
    InECT0Pkts: 92
    InCEPkts: 4

Kernel Tuning configured in /etc/rc.local

ethtool -L eth0 combined 16
echo 52428800 > /proc/sys/net/netfilter/nf_conntrack_max
sysctl -w net.core.rmem_max=33554432
sysctl -w net.core.wmem_max=33554432
sysctl -w net.core.rmem_default=16777216
sysctl -w net.core.wmem_default=16777216
sysctl -w net.core.netdev_max_backlog=65536
sysctl -w net.core.somaxconn=1024
ulimit -n 16000

Network config/ specs are same on all three servers, are we doing something wrong?


Regards,
Rais 

-----Original Message-----
From: Klaus Darilion mailto:klaus.darilion at nic.at 
Sent: Thursday, March 24, 2022 12:38 PM
To: Rais Ahmed mailto:rais.ahmed at tes.com.pk; mailto:dnsdist at mailman.powerdns.com
Subject: AW: [dnsdist] dnsdist[29321]: Marking downstream IP:53 as 'down'

Have you tested how many Qps your Backend is capably to handle? First test your Backend performance to know how much qps a single backend can handle. I guess 500k qps might be difficult to achieve with bind. If you need more performance switch the Backend to NSD or Knot.

regards
Klaus

-----Ursprüngliche Nachricht-----
Von: dnsdist mailto:dnsdist-bounces at mailman.powerdns.com Im Auftrag von 
Rais Ahmed via dnsdist
Gesendet: Mittwoch, 23. März 2022 22:02
An: mailto:dnsdist at mailman.powerdns.com
Betreff: [dnsdist] dnsdist[29321]: Marking downstream IP:53 as 'down'

Hi,
Thanks for reply...!

We have configured setMaxUDPOutstanding(65535) and still we are seeing 
backend down, logs are showing frequently as below.

Timeout while waiting for the health check response from backend
192.168.1.1:53
Timeout while waiting for the health check response from backend
192.168.1.2:53

Please have a look at below dnsdist configuration and help us to find 
misconfiguration (16 Listeners & 8+8 backends added as per vCPUs 
available
(2 Socket x 8 Cores):

controlSocket('127.0.0.1:5199')
setKey("")

---- Listen addresses
addLocal('192.168.0.1:53', { reusePort=true }) 
addLocal('192.168.0.1:53', { reusePort=true }) 
addLocal('192.168.0.1:53', { reusePort=true }) 
addLocal('192.168.0.1:53', { reusePort=true }) 
addLocal('192.168.0.1:53', { reusePort=true }) 
addLocal('192.168.0.1:53', { reusePort=true }) 
addLocal('192.168.0.1:53', { reusePort=true }) 
addLocal('192.168.0.1:53', { reusePort=true }) 
addLocal('192.168.0.1:53', { reusePort=true }) 
addLocal('192.168.0.1:53', { reusePort=true }) 
addLocal('192.168.0.1:53', { reusePort=true }) 
addLocal('192.168.0.1:53', { reusePort=true }) 
addLocal('192.168.0.1:53', { reusePort=true }) 
addLocal('192.168.0.1:53', { reusePort=true }) 
addLocal('192.168.0.1:53', { reusePort=true }) 
addLocal('192.168.0.1:53', { reusePort=true })

---- Back-end server
newServer({address='192.168.1.1', maxCheckFailures=3, checkInterval=5, 
weight=4, qps=40000, order=1}) newServer({address='192.168.1.1', 
maxCheckFailures=3, checkInterval=5, weight=4, qps=40000, order=2}) 
newServer({address='192.168.1.1', maxCheckFailures=3, checkInterval=5, 
weight=4, qps=40000, order=3}) newServer({address='192.168.1.1', 
maxCheckFailures=3, checkInterval=5, weight=4, qps=40000, order=4}) 
newServer({address='192.168.1.1', maxCheckFailures=3, checkInterval=5, 
weight=4, qps=40000, order=5}) newServer({address='192.168.1.1', 
maxCheckFailures=3, checkInterval=5, weight=4, qps=40000, order=6}) 
newServer({address='192.168.1.1', maxCheckFailures=3, checkInterval=5, 
weight=4, qps=40000, order=7}) newServer({address='192.168.1.1', 
maxCheckFailures=3, checkInterval=5, weight=4, qps=40000, order=8}) 
newServer({address='192.168.1.2', maxCheckFailures=3, checkInterval=5, 
weight=4, qps=40000, order=9}) newServer({address='192.168.1.2', 
maxCheckFailures=3, checkInterval=5, weight=4, qps=40000, order=10}) 
newServer({address='192.168.1.2', maxCheckFailures=3, checkInterval=5, 
weight=4, qps=40000, order=11}) newServer({address='192.168.1.2', 
maxCheckFailures=3, checkInterval=5, weight=4, qps=40000, order=12}) 
newServer({address='192.168.1.2', maxCheckFailures=3, checkInterval=5, 
weight=4, qps=40000, order=13}) newServer({address='192.168.1.2', 
maxCheckFailures=3, checkInterval=5, weight=4, qps=40000, order=14}) 
newServer({address='192.168.1.2', maxCheckFailures=3, checkInterval=5, 
weight=4, qps=40000, order=15}) newServer({address='192.168.1.2', 
maxCheckFailures=3, checkInterval=5, weight=4, qps=40000, order=16})

setMaxUDPOutstanding(65535)

---- Server Load Balancing Policy
setServerPolicy(leastOutstanding)

---- Web-server
webserver('192.168.0.1:8083')
setWebserverConfig({acl='192.168.0.0/24', password='Secret'})

---- Customers Policy
customerACLs={'192.168.1.0/24'}
setACL(customerACLs)

pc = newPacketCache(300000, {maxTTL=86400, minTTL=0, 
temporaryFailureTTL=60, staleTTL=60, dontAge=false})
getPool(""):setCache(pc)

setVerboseHealthChecks(true)

Servers Specs are as below:
Dnsdist LB Server Specs: 16 vCPUs, 16 GB RAM, Virtio NIC (10G) with 16 
Multiqueues.
Backend bind9 servers Specs: 16 vCPUs, 16GM RAM, Virtio NIC (10G) with 
16 Multiqueues.

We are trying to handle 500K qps (will increase hardware specs, If 
required) or with above specs atleast 100K qps.


Regards,
Rais

-----Original Message-----
From: dnsdist mailto:dnsdist-bounces at mailman.powerdns.com On Behalf Of 
mailto:dnsdist-request at mailman.powerdns.com
Sent: Wednesday, March 23, 2022 5:00 PM
To: mailto:dnsdist at mailman.powerdns.com
Subject: dnsdist Digest, Vol 79, Issue 3

Send dnsdist mailing list submissions to
	mailto:dnsdist at mailman.powerdns.com

To subscribe or unsubscribe via the World Wide Web, visit
	https://mailman.powerdns.com/mailman/listinfo/dnsdist
or, via email, send a message with subject or body 'help' to
	mailto:dnsdist-request at mailman.powerdns.com

You can reach the person managing the list at
	mailto:dnsdist-owner at mailman.powerdns.com

When replying, please edit your Subject line so it is more specific than "Re:
Contents of dnsdist digest..."


Today's Topics:

   1. dnsdist[29321]: Marking downstream IP:53 as 'down' (Rais Ahmed)
   2. Re: dnsdist[29321]: Marking downstream IP:53 as 'down'
      (Remi Gacogne)


----------------------------------------------------------------------

Message: 1
Date: Tue, 22 Mar 2022 23:00:25 +0000
From: Rais Ahmed mailto:rais.ahmed at tes.com.pk
To: mailto:dnsdist at mailman.powerdns.com mailto:dnsdist at mailman.powerdns.com
Subject: [dnsdist] dnsdist[29321]: Marking downstream IP:53 as 'down'
Message-ID:
	mailto:PAXPR08MB70737E4E1CCEFC4A7F61E1E6A0179 at PAXPR08MB7073.eurprd08.prod.outlook.com

Content-Type: text/plain; charset="us-ascii"

Hi,

We have configured dnsdist instance to handle around 500k QPS, but we 
are seeing downstream down frequently once QPS reached above 25k. 
below are the logs which we found to relative issue.

dnsdist[29321]: Marking downstream server1 IP:53 as 'down'
dnsdist[29321]: Marking downstream server2 IP:53 as 'down'
-------------- next part -------------- An HTML attachment was 
scrubbed...
URL:
http://mailman.powerdns.com/pipermail/dnsdist/attachments/20220322/2befd6e2/attachment-0001.htm

------------------------------

Message: 2
Date: Wed, 23 Mar 2022 10:32:22 +0100
From: Remi Gacogne mailto:remi.gacogne at powerdns.com
To: Rais Ahmed mailto:rais.ahmed at tes.com.pk, mailto:dnsdist at mailman.powerdns.com
	mailto:dnsdist at mailman.powerdns.com
Subject: Re: [dnsdist] dnsdist[29321]: Marking downstream IP:53 as
	'down'
Message-ID: mailto:5a95cbeb-7c82-9bc1-0b4c-8726f814432e at powerdns.com
Content-Type: text/plain; charset=UTF-8; format=flowed

Hi,

 > We have configured dnsdist instance to handle around 500k QPS, but 
we  > are seeing downstream down frequently once QPS reached above 25k.
below  > are the logs which we found to relative issue.
 >
 > dnsdist[29321]: Marking downstream server1 IP:53 as 'down'
 >
 > dnsdist[29321]: Marking downstream server2 IP:53 as 'down'

You might be able to get more information about why the health-checks 
are failing by adding setVerboseHealthChecks(true) to your configuration.

It usually happens because the backend is overwhelmed and needs to be 
tuned to handle the load, but it might also be caused by a network 
issue, like a link reaching its maximum capacity, or by dnsdist itself 
being overwhelmed and needing tuning (like increasing the number of
newServer() directives, see [1]).

[1]:
https://dnsdist.org/advanced/tuning.html#udp-and-incoming-dns-over-
https

Best regards,
--
Remi Gacogne
PowerDNS.COM BV - https://www.powerdns.com/


------------------------------

Subject: Digest Footer

_______________________________________________
dnsdist mailing list
mailto:dnsdist at mailman.powerdns.com
https://mailman.powerdns.com/mailman/listinfo/dnsdist


------------------------------

End of dnsdist Digest, Vol 79, Issue 3
**************************************
_______________________________________________
dnsdist mailing list
mailto:dnsdist at mailman.powerdns.com
https://mailman.powerdns.com/mailman/listinfo/dnsdist
_______________________________________________
dnsdist mailing list
mailto:dnsdist at mailman.powerdns.com
https://mailman.powerdns.com/mailman/listinfo/dnsdist


More information about the dnsdist mailing list