[Pdns-users] Provisions for DB failures?

Sun Mar 16 20:51:48 UTC 2003

On Sun, Mar 16, 2003 at 09:11:01PM +0100, Gabriel Ambuehl wrote:

> > In this case, if any of the slaves has a database problem, it goes down.
> The
> > other slaves continue to work and the internet does not notice that
> there is
> > a problem.
> 
> More or less. Except for the people that need to make two queries, of
> course.

What do you mean? Making a second query 500ms later shouldn't hurt that
much, and happens only once per ttl per recursor.

> I suggest this: no algorithm at all. From what I've gathered,
> everything needed is in place already as you can have multiple
> backends. So what's needed is a mechanism that you can supply
> additional DBs which are queried in case the first query would result
> in a servfailed reply (I believe that's already happening if you have
> different backends right?).

Wrong. If you have multiple database backends, they are considered to
complement eachother, like described in
http://doc.powerdns.com/pipebackend-dynamic-resolution.html#PIPE-AND-BIND

If any of them errors before a definite answer ('does exist', 'does not
exist') is in, the whole query gets a SERVFAIL. Partial data is an error!

> Now personally, what I'd love to see more than anything else (and more
> than the above, of coure) is a dumb flat file based module that can be
> used as second backend in case the DB fails. Say the flatfile gets
> produced every X hours from the DB. This is relatively cheap
> to generate (it could be done with a one line crontab entry,
> actually) with any but very big installations where there are
> usually other provisions taken that RDBMS don't go down.

And then you detect personally that your 'dump database to disk script' does
not make things worse by performing a partial dump because you get 'error
3434: index key duplicate' halfway during the dump? You've only moved the
problem!

Remember, failures are unexpected and mostly have a real cause. They are
rarely, if ever, neatly delineated from normal operation. PowerDNS tries
very hard to make sure it does not report a definite result in case of *any*
indication of failure so as not to report 'no such host'.

The best thing to do is have a tool like 'nagios' to determine if all your
slaves are doing the right thing and investigate if one of them isn't. I'd
hesitate to rely on a broken system to do the right thing in case of
failure.

Rely on your slaves, and make sure each has its own independent database.

Alternatively, if anybody comes up with a very simple idea, I'm willing to
implement it. However, I'm not going to throw "good code after a bad
problem", raising complexity while probably not helping anyhow.

Regards,

bert

-- 
http://www.PowerDNS.com      Open source, database driven DNS Software 
http://lartc.org           Linux Advanced Routing & Traffic Control HOWTO
http://netherlabs.nl                         Consulting