<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=us-ascii">
</head>
<body>
<div style="color: rgb(33, 33, 33); background-color: rgb(255, 255, 255);" dir="auto">
Hi all</div>
<div style="color: rgb(33, 33, 33); background-color: rgb(255, 255, 255);" dir="auto">
<br>
</div>
<div style="color: rgb(33, 33, 33); background-color: rgb(255, 255, 255);" dir="auto">
I run a reasonably sized PowerDNS setup (high millions of domains across a few instances). So far the way I have been scaling it is working fine but I would like to get some addition suggestions in case I missed something. When we need extra capacity currently
its a matter of adding a dnadist server for the front end or PowerDNS with MariaDB for backend </div>
<div style="color: rgb(33, 33, 33); background-color: rgb(255, 255, 255);" dir="auto">
<br>
</div>
<div style="color: rgb(33, 33, 33); background-color: rgb(255, 255, 255);" dir="auto">
Dnsdist answers a large number of queries from cache which reduces the load nicely but every now and then we will get an attack which will punch through the caching with random subdomains and then cause a high load on the PowerDNS auth servers. If that occurs
our strategy has been to add the domain to a pre defined suffix match group on dnsdist which applies stricter rate limiting which works well enough. We use other rules to limit QPS from prefixes of certain sizes which does help sometimes but for the latest
attacks they seem to be all spoofed IP's not in any particularly easy to limit prefix.</div>
<div style="color: rgb(33, 33, 33); background-color: rgb(255, 255, 255);" dir="auto">
<br>
</div>
<div style="color: rgb(33, 33, 33); background-color: rgb(255, 255, 255);" dir="auto">
The setup we use is:</div>
<div style="color: rgb(33, 33, 33); background-color: rgb(255, 255, 255);" dir="auto">
<br>
</div>
<div style="color: rgb(33, 33, 33); background-color: rgb(255, 255, 255);" dir="auto">
*<span style="font-size: 12pt;"> 2 sets of MariaDB "master" VM's (2 clusters in 2 geographically separated locations) which are active/active and replicate from/to each other. All write queries are directed to these.</span></div>
<div style="color: rgb(33, 33, 33); background-color: rgb(255, 255, 255);" dir="auto">
<br>
</div>
<div style="color: rgb(33, 33, 33); background-color: rgb(255, 255, 255);" dir="auto">
* 3 PowerDNS "delayed slave" auth VM's geographically distributed, each of which has its own MariaDB install which acts as a read only slave to the master servers. These servers are configured with a replication delay for DR purposes, they do not normally get
any traffic.</div>
<div style="color: rgb(33, 33, 33); background-color: rgb(255, 255, 255);" dir="auto">
<br>
</div>
<div style="color: rgb(33, 33, 33); background-color: rgb(255, 255, 255);" dir="auto">
* Multiple PowerDNS auth VM's geographically distributed (in at least pairs) with the same setup as the delayed slave servers. They do not have any replication delay configured and they are the servers that receive traffic from dnsdist normally.</div>
<div style="color: rgb(33, 33, 33); background-color: rgb(255, 255, 255);" dir="auto">
<br>
</div>
<div style="color: rgb(33, 33, 33); background-color: rgb(255, 255, 255);" dir="auto">
* Multiple dnsdist servers in geographically distributed areas. Queries prefer to be sent to the local auth servers if they are available, if not then remote auth servers if they are available followed by the delayed DR servers. For stability the IP's dnsdist
listens on for queries is bound to loopback adapter and it is advertised to the rest of the network with bgp.</div>
<div style="color: rgb(33, 33, 33); background-color: rgb(255, 255, 255);" dir="auto">
<br>
</div>
<div style="color: rgb(33, 33, 33); background-color: rgb(255, 255, 255);" dir="auto">
The servers are all on SSD's except 2 (waiting for hardware refresh...) With a reasonable amount of RAM and CPU resources. During the attacks the biggest bottleneck seems to be the DB. I plan on doing some simulated benchmarks directly on the DB to see what
numbers I am getting without the overhead of PowerDNS parsing the quest, generating query, waiting for answer etc.</div>
<div style="color: rgb(33, 33, 33); background-color: rgb(255, 255, 255);" dir="auto">
<br>
</div>
<div style="color: rgb(33, 33, 33); background-color: rgb(255, 255, 255);" dir="auto">
I would be curious if there is already a tool which could perform the test I mentioned above or if I will have to end up writing on. If I do write one my goal would be to run test, change setting (from MariaDB or PowerDNS) and repeat.</div>
<div style="color: rgb(33, 33, 33); background-color: rgb(255, 255, 255);" dir="auto">
<br>
</div>
<div style="color: rgb(33, 33, 33); background-color: rgb(255, 255, 255);" dir="auto">
Also if you know of any other relevant OS related tuning or MariaDB related tuning that would help. I would be happy to run additional benchmarks to see what the impact would be and publish them later.</div>
</body>
</html>