[Pdns-users] MySQL/MariaDb Scaling
frank+pdns at tembo.be
Mon Jun 7 07:54:51 UTC 2021
Most setups I've worked on, didn't move the thousands of domains to LMDB, but only activated LMDB on demand for domains under attack. The main reason not to use the LMDB backend for that many domains, is the fact that LMDB doesn't have any replication mechanisms. Also, the PowerDNS LMDB handling kind of assumes it's the only application touching the DB, so all interactions with the DB would need to use the PowerDNS tools or API. It also doesn't support the SuperReplica features, not sure if you're using them or not.
> On Jun 7, 2021, at 9:20 AM, Thomas Mieslinger <miesi at mail.com> wrote:
> Hi Frank,
> thanks for noticing the possible speedup by using lmdb Backend instead
> of g*sql backends.
> Is anyone doing lmdb with millions of Zones? How do you keep them in
> sync with the master? Is there also a simple way like mysql replication?
> Is it feasible to have a slave servers which check the SOA of millions
> of zones on a master DNS Server?
> Cheers Thomas
> Am 04.06.21 um 10:32 schrieb Frank Louwers:
>> As Thomas said: your setup looks sane, and if it currently works for you, there's no need to change anything.
>> If you do have zones that are getting hit by a random-subdomain-lookup attack, I would recommend to have a separate NS with a BIND or LMDB backend ready to serve only those domains. You'll be able to withstand the attacks for a longer time (LMDB/bind lookups for random records are way quicker than on MySQL), but also separate the domain under attack from your other domains, so the blast radius is greatly reduced.
>> Kind Regards,
>>> On Jun 4, 2021, at 8:48 AM, Thomas Mieslinger via Pdns-users <pdns-users at mailman.powerdns.com> wrote:
>>> as it seems to work for you, why change. Sounds like you are using all
>>> modern technologies which are available.
>>> I run a setup for millions of zones which was designed when dnsdist was
>>> not yet written.
>>> Instead of separating dnsdist, pdns authoritative and mariadb on
>>> separate vms, I run mariadb, pdns and quagga on hardware.
>>> The mariadbs are slaves to one failover master pair. Everything is
>>> replicated with mysql replication.
>>> Whenever I want do scale, I buy servers, send them to the locations
>>> (typically racks in CoLos), get bgp to the routers up adn running, an
>>> serve requests.
>>> These servers test themselves whether pdns is able to answer or not and
>>> then bgp anannonuce (or not) the pdns instance IPs.
>>> Just my cents.
>>> Am 02.06.21 um 18:49 schrieb Mailing Lists via Pdns-users:
>>>> Hi all
>>>> I run a reasonably sized PowerDNS setup (high millions of domains across
>>>> a few instances). So far the way I have been scaling it is working fine
>>>> but I would like to get some addition suggestions in case I missed
>>>> something. When we need extra capacity currently its a matter of adding
>>>> a dnadist server for the front end or PowerDNS with MariaDB for backend
>>>> Dnsdist answers a large number of queries from cache which reduces the
>>>> load nicely but every now and then we will get an attack which will
>>>> punch through the caching with random subdomains and then cause a high
>>>> load on the PowerDNS auth servers. If that occurs our strategy has been
>>>> to add the domain to a pre defined suffix match group on dnsdist which
>>>> applies stricter rate limiting which works well enough. We use other
>>>> rules to limit QPS from prefixes of certain sizes which does help
>>>> sometimes but for the latest attacks they seem to be all spoofed IP's
>>>> not in any particularly easy to limit prefix.
>>>> The setup we use is:
>>>> * 2 sets of MariaDB "master" VM's (2 clusters in 2 geographically
>>>> separated locations) which are active/active and replicate from/to each
>>>> other. All write queries are directed to these.
>>>> * 3 PowerDNS "delayed slave" auth VM's geographically distributed, each
>>>> of which has its own MariaDB install which acts as a read only slave to
>>>> the master servers. These servers are configured with a replication
>>>> delay for DR purposes, they do not normally get any traffic.
>>>> * Multiple PowerDNS auth VM's geographically distributed (in at least
>>>> pairs) with the same setup as the delayed slave servers. They do not
>>>> have any replication delay configured and they are the servers that
>>>> receive traffic from dnsdist normally.
>>>> * Multiple dnsdist servers in geographically distributed areas. Queries
>>>> prefer to be sent to the local auth servers if they are available, if
>>>> not then remote auth servers if they are available followed by the
>>>> delayed DR servers. For stability the IP's dnsdist listens on for
>>>> queries is bound to loopback adapter and it is advertised to the rest of
>>>> the network with bgp.
>>>> The servers are all on SSD's except 2 (waiting for hardware refresh...)
>>>> With a reasonable amount of RAM and CPU resources. During the attacks
>>>> the biggest bottleneck seems to be the DB. I plan on doing some
>>>> simulated benchmarks directly on the DB to see what numbers I am getting
>>>> without the overhead of PowerDNS parsing the quest, generating query,
>>>> waiting for answer etc.
>>>> I would be curious if there is already a tool which could perform the
>>>> test I mentioned above or if I will have to end up writing on. If I do
>>>> write one my goal would be to run test, change setting (from MariaDB or
>>>> PowerDNS) and repeat.
>>>> Also if you know of any other relevant OS related tuning or MariaDB
>>>> related tuning that would help. I would be happy to run additional
>>>> benchmarks to see what the impact would be and publish them later.
>>>> Pdns-users mailing list
>>>> Pdns-users at mailman.powerdns.com
>>> Pdns-users mailing list
>>> Pdns-users at mailman.powerdns.com
More information about the Pdns-users