[Pdns-users] MySQL/MariaDb Scaling
lists at gbe0.com
Wed Jun 2 16:49:15 UTC 2021
I run a reasonably sized PowerDNS setup (high millions of domains across a few instances). So far the way I have been scaling it is working fine but I would like to get some addition suggestions in case I missed something. When we need extra capacity currently its a matter of adding a dnadist server for the front end or PowerDNS with MariaDB for backend
Dnsdist answers a large number of queries from cache which reduces the load nicely but every now and then we will get an attack which will punch through the caching with random subdomains and then cause a high load on the PowerDNS auth servers. If that occurs our strategy has been to add the domain to a pre defined suffix match group on dnsdist which applies stricter rate limiting which works well enough. We use other rules to limit QPS from prefixes of certain sizes which does help sometimes but for the latest attacks they seem to be all spoofed IP's not in any particularly easy to limit prefix.
The setup we use is:
* 2 sets of MariaDB "master" VM's (2 clusters in 2 geographically separated locations) which are active/active and replicate from/to each other. All write queries are directed to these.
* 3 PowerDNS "delayed slave" auth VM's geographically distributed, each of which has its own MariaDB install which acts as a read only slave to the master servers. These servers are configured with a replication delay for DR purposes, they do not normally get any traffic.
* Multiple PowerDNS auth VM's geographically distributed (in at least pairs) with the same setup as the delayed slave servers. They do not have any replication delay configured and they are the servers that receive traffic from dnsdist normally.
* Multiple dnsdist servers in geographically distributed areas. Queries prefer to be sent to the local auth servers if they are available, if not then remote auth servers if they are available followed by the delayed DR servers. For stability the IP's dnsdist listens on for queries is bound to loopback adapter and it is advertised to the rest of the network with bgp.
The servers are all on SSD's except 2 (waiting for hardware refresh...) With a reasonable amount of RAM and CPU resources. During the attacks the biggest bottleneck seems to be the DB. I plan on doing some simulated benchmarks directly on the DB to see what numbers I am getting without the overhead of PowerDNS parsing the quest, generating query, waiting for answer etc.
I would be curious if there is already a tool which could perform the test I mentioned above or if I will have to end up writing on. If I do write one my goal would be to run test, change setting (from MariaDB or PowerDNS) and repeat.
Also if you know of any other relevant OS related tuning or MariaDB related tuning that would help. I would be happy to run additional benchmarks to see what the impact would be and publish them later.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Pdns-users