[dnsdist] dnsdist[29321]: Marking downstream IP:53 as 'down'

Rasto Rickardt rasto.rickardt at gmail.com
Thu Mar 24 10:35:35 UTC 2022


Hello Rais,

i noticed that you are increasing nf_conntrack_max. I am not sure how 
the backend servers are connected,

but i suggest not to use connection tracking/NAT at all. You can use for 
example dedicated interface for backend

management and other one to connect to dnsdist.

r.

On 24/03/2022 11:11, Rais Ahmed via dnsdist wrote:
> Hi,
>
> Thanks for the guidance...!
>
> We are testing with multiple scenarios, with/without kernel tuning. We observed UDP packets errors on both backend servers (not a single UDP error on dnsdist LB server).
>
> Tested with resperf 15K QPS
> resperf -s 192.168.0.1 -R -d queryfile-example-10million-201202 -C 100 -c 300 -r 0 -m 15000 -q 200000
>
> Backend 1: 192.168.1.1 (without Kernel tuning):
> netstat -su
> IcmpMsg:
>      InType3: 2229
>      InType8: 6
>      InType11: 194
>      OutType0: 6
>      OutType3: 762
> Udp:
>      1634847 packets received
>      843 packets to unknown port received.
>      193891 packet receive errors
>      1859642 packets sent
>      193891 receive buffer errors
>      0 send buffer errors
> UdpLite:
> IpExt:
>      InOctets: 580762744
>      OutOctets: 237368675
>      InNoECTPkts: 1995692
>      InECT0Pkts: 27
>
> Backend 2: 192.168.1.2 (with Kernel Tuning):
> netstat -su
> IcmpMsg:
>      InType3: 19177
>      InType8: 5802
>      InType11: 2645
>      OutType0: 5802
>      OutType3: 5122
> Udp:
>      10798358 packets received
>      6846 packets to unknown port received.
>      4815377 packet receive errors
>      11949871 packets sent
>      4815377 receive buffer errors
>      0 send buffer errors
> UdpLite:
> IpExt:
>      InNoRoutes: 11
>      InOctets: 3312682950
>      OutOctets: 1741771756
>      InNoECTPkts: 16355120
>      InECT1Pkts: 72
>      InECT0Pkts: 92
>      InCEPkts: 4
>
> Kernel Tuning configured in /etc/rc.local
>
> ethtool -L eth0 combined 16
> echo 52428800 > /proc/sys/net/netfilter/nf_conntrack_max
> sysctl -w net.core.rmem_max=33554432
> sysctl -w net.core.wmem_max=33554432
> sysctl -w net.core.rmem_default=16777216
> sysctl -w net.core.wmem_default=16777216
> sysctl -w net.core.netdev_max_backlog=65536
> sysctl -w net.core.somaxconn=1024
> ulimit -n 16000
>
> Network config/ specs are same on all three servers, are we doing something wrong?
>
>
> Regards,
> Rais
>
> -----Original Message-----
> From: Klaus Darilion<klaus.darilion at nic.at>  
> Sent: Thursday, March 24, 2022 12:38 PM
> To: Rais Ahmed<rais.ahmed at tes.com.pk>;dnsdist at mailman.powerdns.com
> Subject: AW: [dnsdist] dnsdist[29321]: Marking downstream IP:53 as 'down'
>
> Have you tested how many Qps your Backend is capably to handle? First test your Backend performance to know how much qps a single backend can handle. I guess 500k qps might be difficult to achieve with bind. If you need more performance switch the Backend to NSD or Knot.
>
> regards
> Klaus
>
>> -----Ursprüngliche Nachricht-----
>> Von: dnsdist<dnsdist-bounces at mailman.powerdns.com>  Im Auftrag von
>> Rais Ahmed via dnsdist
>> Gesendet: Mittwoch, 23. März 2022 22:02
>> An:dnsdist at mailman.powerdns.com
>> Betreff: [dnsdist] dnsdist[29321]: Marking downstream IP:53 as 'down'
>>
>> Hi,
>> Thanks for reply...!
>>
>> We have configured setMaxUDPOutstanding(65535) and still we are seeing
>> backend down, logs are showing frequently as below.
>>
>> Timeout while waiting for the health check response from backend
>> 192.168.1.1:53
>> Timeout while waiting for the health check response from backend
>> 192.168.1.2:53
>>
>> Please have a look at below dnsdist configuration and help us to find
>> misconfiguration (16 Listeners & 8+8 backends added as per vCPUs
>> available
>> (2 Socket x 8 Cores):
>>
>> controlSocket('127.0.0.1:5199')
>> setKey("")
>>
>> ---- Listen addresses
>> addLocal('192.168.0.1:53', { reusePort=true })
>> addLocal('192.168.0.1:53', { reusePort=true })
>> addLocal('192.168.0.1:53', { reusePort=true })
>> addLocal('192.168.0.1:53', { reusePort=true })
>> addLocal('192.168.0.1:53', { reusePort=true })
>> addLocal('192.168.0.1:53', { reusePort=true })
>> addLocal('192.168.0.1:53', { reusePort=true })
>> addLocal('192.168.0.1:53', { reusePort=true })
>> addLocal('192.168.0.1:53', { reusePort=true })
>> addLocal('192.168.0.1:53', { reusePort=true })
>> addLocal('192.168.0.1:53', { reusePort=true })
>> addLocal('192.168.0.1:53', { reusePort=true })
>> addLocal('192.168.0.1:53', { reusePort=true })
>> addLocal('192.168.0.1:53', { reusePort=true })
>> addLocal('192.168.0.1:53', { reusePort=true })
>> addLocal('192.168.0.1:53', { reusePort=true })
>>
>> ---- Back-end server
>> newServer({address='192.168.1.1', maxCheckFailures=3, checkInterval=5,
>> weight=4, qps=40000, order=1}) newServer({address='192.168.1.1',
>> maxCheckFailures=3, checkInterval=5, weight=4, qps=40000, order=2})
>> newServer({address='192.168.1.1', maxCheckFailures=3, checkInterval=5,
>> weight=4, qps=40000, order=3}) newServer({address='192.168.1.1',
>> maxCheckFailures=3, checkInterval=5, weight=4, qps=40000, order=4})
>> newServer({address='192.168.1.1', maxCheckFailures=3, checkInterval=5,
>> weight=4, qps=40000, order=5}) newServer({address='192.168.1.1',
>> maxCheckFailures=3, checkInterval=5, weight=4, qps=40000, order=6})
>> newServer({address='192.168.1.1', maxCheckFailures=3, checkInterval=5,
>> weight=4, qps=40000, order=7}) newServer({address='192.168.1.1',
>> maxCheckFailures=3, checkInterval=5, weight=4, qps=40000, order=8})
>> newServer({address='192.168.1.2', maxCheckFailures=3, checkInterval=5,
>> weight=4, qps=40000, order=9}) newServer({address='192.168.1.2',
>> maxCheckFailures=3, checkInterval=5, weight=4, qps=40000, order=10})
>> newServer({address='192.168.1.2', maxCheckFailures=3, checkInterval=5,
>> weight=4, qps=40000, order=11}) newServer({address='192.168.1.2',
>> maxCheckFailures=3, checkInterval=5, weight=4, qps=40000, order=12})
>> newServer({address='192.168.1.2', maxCheckFailures=3, checkInterval=5,
>> weight=4, qps=40000, order=13}) newServer({address='192.168.1.2',
>> maxCheckFailures=3, checkInterval=5, weight=4, qps=40000, order=14})
>> newServer({address='192.168.1.2', maxCheckFailures=3, checkInterval=5,
>> weight=4, qps=40000, order=15}) newServer({address='192.168.1.2',
>> maxCheckFailures=3, checkInterval=5, weight=4, qps=40000, order=16})
>>
>> setMaxUDPOutstanding(65535)
>>
>> ---- Server Load Balancing Policy
>> setServerPolicy(leastOutstanding)
>>
>> ---- Web-server
>> webserver('192.168.0.1:8083')
>> setWebserverConfig({acl='192.168.0.0/24', password='Secret'})
>>
>> ---- Customers Policy
>> customerACLs={'192.168.1.0/24'}
>> setACL(customerACLs)
>>
>> pc = newPacketCache(300000, {maxTTL=86400, minTTL=0,
>> temporaryFailureTTL=60, staleTTL=60, dontAge=false})
>> getPool(""):setCache(pc)
>>
>> setVerboseHealthChecks(true)
>>
>> Servers Specs are as below:
>> Dnsdist LB Server Specs: 16 vCPUs, 16 GB RAM, Virtio NIC (10G) with 16
>> Multiqueues.
>> Backend bind9 servers Specs: 16 vCPUs, 16GM RAM, Virtio NIC (10G) with
>> 16 Multiqueues.
>>
>> We are trying to handle 500K qps (will increase hardware specs, If
>> required) or with above specs atleast 100K qps.
>>
>>
>> Regards,
>> Rais
>>
>> -----Original Message-----
>> From: dnsdist<dnsdist-bounces at mailman.powerdns.com>  On Behalf Of
>> dnsdist-request at mailman.powerdns.com
>> Sent: Wednesday, March 23, 2022 5:00 PM
>> To:dnsdist at mailman.powerdns.com
>> Subject: dnsdist Digest, Vol 79, Issue 3
>>
>> Send dnsdist mailing list submissions to
>> 	dnsdist at mailman.powerdns.com
>>
>> To subscribe or unsubscribe via the World Wide Web, visit
>> 	https://mailman.powerdns.com/mailman/listinfo/dnsdist
>> or, via email, send a message with subject or body 'help' to
>> 	dnsdist-request at mailman.powerdns.com
>>
>> You can reach the person managing the list at
>> 	dnsdist-owner at mailman.powerdns.com
>>
>> When replying, please edit your Subject line so it is more specific than "Re:
>> Contents of dnsdist digest..."
>>
>>
>> Today's Topics:
>>
>>     1. dnsdist[29321]: Marking downstream IP:53 as 'down' (Rais Ahmed)
>>     2. Re: dnsdist[29321]: Marking downstream IP:53 as 'down'
>>        (Remi Gacogne)
>>
>>
>> ----------------------------------------------------------------------
>>
>> Message: 1
>> Date: Tue, 22 Mar 2022 23:00:25 +0000
>> From: Rais Ahmed<rais.ahmed at tes.com.pk>
>> To:"dnsdist at mailman.powerdns.com"  <dnsdist at mailman.powerdns.com>
>> Subject: [dnsdist] dnsdist[29321]: Marking downstream IP:53 as 'down'
>> Message-ID:
>> 	<PAXPR08MB70737E4E1CCEFC4A7F61E1E6A0179 at PAXPR08MB7073.e 
>> urprd08.prod.outlook.com>
>>
>> Content-Type: text/plain; charset="us-ascii"
>>
>> Hi,
>>
>> We have configured dnsdist instance to handle around 500k QPS, but we
>> are seeing downstream down frequently once QPS reached above 25k.
>> below are the logs which we found to relative issue.
>>
>> dnsdist[29321]: Marking downstream server1 IP:53 as 'down'
>> dnsdist[29321]: Marking downstream server2 IP:53 as 'down'
>> -------------- next part -------------- An HTML attachment was
>> scrubbed...
>> URL:
>> <http://mailman.powerdns.com/pipermail/dnsdist/attachments/20220322/2 
>> befd6e2/attachment-0001.htm>
>>
>> ------------------------------
>>
>> Message: 2
>> Date: Wed, 23 Mar 2022 10:32:22 +0100
>> From: Remi Gacogne<remi.gacogne at powerdns.com>
>> To: Rais Ahmed<rais.ahmed at tes.com.pk>,"dnsdist at mailman.powerdns.com"
>> 	<dnsdist at mailman.powerdns.com>
>> Subject: Re: [dnsdist] dnsdist[29321]: Marking downstream IP:53 as
>> 	'down'
>> Message-ID:<5a95cbeb-7c82-9bc1-0b4c-8726f814432e at powerdns.com>
>> Content-Type: text/plain; charset=UTF-8; format=flowed
>>
>> Hi,
>>
>>   > We have configured dnsdist instance to handle around 500k QPS, but
>> we  > are seeing downstream down frequently once QPS reached above 25k.
>> below  > are the logs which we found to relative issue.
>>   >
>>   > dnsdist[29321]: Marking downstream server1 IP:53 as 'down'
>>   >
>>   > dnsdist[29321]: Marking downstream server2 IP:53 as 'down'
>>
>> You might be able to get more information about why the health-checks
>> are failing by adding setVerboseHealthChecks(true) to your configuration.
>>
>> It usually happens because the backend is overwhelmed and needs to be
>> tuned to handle the load, but it might also be caused by a network
>> issue, like a link reaching its maximum capacity, or by dnsdist itself
>> being overwhelmed and needing tuning (like increasing the number of
>> newServer() directives, see [1]).
>>
>> [1]:
>> https://dnsdist.org/advanced/tuning.html#udp-and-incoming-dns-over-
>> https
>>
>> Best regards,
>> --
>> Remi Gacogne
>> PowerDNS.COM BV -https://www.powerdns.com/
>>
>>
>> ------------------------------
>>
>> Subject: Digest Footer
>>
>> _______________________________________________
>> dnsdist mailing list
>> dnsdist at mailman.powerdns.com
>> https://mailman.powerdns.com/mailman/listinfo/dnsdist
>>
>>
>> ------------------------------
>>
>> End of dnsdist Digest, Vol 79, Issue 3
>> **************************************
>> _______________________________________________
>> dnsdist mailing list
>> dnsdist at mailman.powerdns.com
>> https://mailman.powerdns.com/mailman/listinfo/dnsdist
> _______________________________________________
> dnsdist mailing list
> dnsdist at mailman.powerdns.com
> https://mailman.powerdns.com/mailman/listinfo/dnsdist
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.powerdns.com/pipermail/dnsdist/attachments/20220324/b0b84dc9/attachment.htm>


More information about the dnsdist mailing list