[Pdns-users] Odd master/slave behavior for large domains

bert hubert bert.hubert at netherlabs.nl
Fri Sep 11 18:56:27 UTC 2009


On Fri, Sep 11, 2009 at 7:14 AM, thomas morgan <tm at zerigo.com> wrote:
> I created a single zone on the server and added 2 million host records. I
> know that's a bunch, but it is a specific use case, not just an attempt to
> break things.

Thomas,

Many thanks for your interesting and detailed bug report! I've done
some initial investigation, and my guess is that this is a deficiency
in our PostgreSQL support, or perhaps in the PostgreSQL client library
(unlikely I think).

Sadly I am very busy right now, but you could help debugging this if
you could reproduce this issue (or perhaps, fail to) using the MySQL
backend or the BIND backend.

If this clears up the issue, I know where to look.

>
> Oddity #1:
> The master seems to send 3-4 NOTIFYs when the zone is updated -- at least
> the slave is reporting in the logs to have received multiple NOTIFYs.
> They're pretty consistently spaced: in several instances, 4 NOTIFYs with
> intervals 3sec, 5sec, 9sec.

This is just laziness - we keep sending NOTIFY packets until they are
acknowledged, which PowerDNS only does after the AXFR succeeded.

> Oddity #2:
> The slave, upon receiving multiple NOTIFYs, initiates multiple AXFRs for the
> same zone. For a small zone this wouldn't be a big deal, but for a large
> zone it's fatal.

This one is stupid on our side.

> Oddity #3:
> The master consumes a *huge* amount of RAM to do each AXFR: 283 MB worth per
> AXFR. Memory usage on the slave seems tolerable though.
>
> Oddity #4:
> If an AXFR doesn't finish properly, the memory is never released. I've
> managed to reproduce this using dig to perform the AXFR as well.

3 and 4 should go away if we debug the (probable) PostgreSQL support
problem in PowerDNS.

> Anything I'm missing or that I can do to help figure out what's wrong? Some
> logs are included below; they are not identical in every case, but this is
> representative.

Reproducing with the BIND backend should help find the cause of this.
You can just ask it to load the output of dig -t AXFR > zone.

Please let me know! From there we'll look at the other issues you found.

    Bert



More information about the Pdns-users mailing list