[Pdns-users] RE: Recursion failing on certain records? - solved (PEBKAC)

Kirk Friggstad friggstadk at ironsolutions.com
Wed Aug 23 23:45:25 UTC 2006

<sigh> It turns out I made a couple stupid mistakes that worked together to
make me look like a fool. :-) When first testing this problem, I was using
dig @[my-server-ip], but after updating my recursor to 3.1.2, I changed over
to using dig @localhost - but I only tested the problem domains after the
3.1.2 update, and didn't test any "known good" domains (like google.com,
etc.). If I had tested the "known good" domains @localhost, I would have
discovered (earlier than this afternoon, and probably before sending out my
first description of the "problem") that I had neglected to include in the "allowed-recursion" section of pdns.conf, and that ALL
recursive queries would fail. Once I fixed this oversight, things began to
work as expected.

My apologies for wasting everyone's time here, and thanks to Bert and Darren
for their help. My only defense is that I am recently back from a vacation,
and can only assume that my brain was left behind somewhere. :-)


-----Original Message-----
From: Darren Gamble [mailto:darren.gamble at sjrb.ca] 
Sent: Wednesday, August 23, 2006 9:55 AM
To: Kirk Friggstad; pdns-users at mailman.powerdns.com
Subject: RE: [Pdns-users] RE: Recursion failing on certain records?

Hi Kirk,

> I think I maybe went a little too complicated on my explanation here,
> missed a couple semi-crucial details. We are *not* the authoritative
> servers
> for the two domains that are giving problems (acegroup.cc and
> hivelocity.net).

No, I understood this.  I had simply pointed out that when you said you
queried the "authoritative server", you were actually querying your
cache.  Note the "@localhost" in your query.  You should have queried
for the NS records for the domain, and directed your queries there.

> Darren - you said that there was something in the configuration of
> two
> domains that would fail in pre-3.2.1 versions of the recursor. Can you
> give
> a bit more detail on that?

The whole explanation is a bit complicated, but basically because the
SLD servers have different NS records with different TTLs than the
authoritative servers, older pdns servers will mash the two record sets
together into a single name, with different TTLs.  This is verboten by
RFC, and causes a problem- it causes the record to change as names with
lower TTLs expire.  If this leaves with you with only NS record(s) that
don't respond- which is the issue here- then the recursor is not able to
look up names on that domain anymore.  This is also why the problem is
intermittent- when all of the records expire, the cache goes back up to
the SLD servers and the process starts over again.

The zone is definitely not configured as it should be, but it should
still work.  They shouldn't have registered servers that aren't

In 3.1.2+, pdns will just replace the record set with the authoritative
server's.  This fixes the problem, and is consistant with other caching

Again, if you were to post the NS records for the domain on each server
when you have this problem, this will tell you for certain if this is
the case.  This should be Step 1 of troubleshooting any DNS issue like

> My question is - is there something in the code for pdns_server
> and
> 2.9.21 snapshot at least),

This zone, and many others like it, should resolve properly if you
upgrade all of your caches to at least 3.1.2.  That is what you need to
do.  3.1.3 should be released soon too, which contains some important
crash fixes too.

Darren Gamble
Planner, Regional Services
Shaw Cablesystems GP
630 - 3rd Avenue SW
Calgary, Alberta, Canada
T2P 4L4
(403) 781-4948

> -----Original Message-----
> From: Kirk Friggstad [mailto:friggstadk at ironsolutions.com]
> Sent: Tuesday, August 22, 2006 11:52 AM
> To: 'pdns-users at mailman.powerdns.com'
> Subject: Recursion failing on certain records?
> Greetings all:
> I've been puzzling through some strangeness in our PowerDNS
> here. Recursive queries for certain records/domains have been failing
> consistently for a number of weeks - two examples are:
> 	mail.acegroup.cc
> 	mail.hivelocity.net
> If I query the authoritative server, I get a SERVFAIL:
>   $ dig @localhost mail.hivelocity.net
>   ; <<>> DiG 9.2.4 <<>> @localhost mail.hivelocity.net
>   ; (1 server found)
>   ;; global options:  printcmd
>   ;; Got answer:
>   ;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 49833
>   ;; flags: qr rd; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0
>   ;mail.hivelocity.net.           IN      A
>   ;; Query time: 1 msec
>   ;; SERVER:
>   ;; WHEN: Tue Aug 22 11:10:21 2006
>   ;; MSG SIZE  rcvd: 37
> but if I query the 3.1.2 recursor directly, I get the correct answer:
>   $ dig @localhost -p 4754 mail.hivelocity.net
>   ; <<>> DiG 9.2.4 <<>> @localhost -p 4754 mail.hivelocity.net
>   ; (1 server found)
>   ;; global options:  printcmd
>   ;; Got answer:
>   ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 1932
>   ;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0
>   ;mail.hivelocity.net.           IN      A
>   mail.hivelocity.net.    300     IN      A
>   ;; Query time: 206 msec
>   ;; SERVER:
>   ;; WHEN: Tue Aug 22 11:10:04 2006
>   ;; MSG SIZE  rcvd: 53
> Querying a 2.9.20 recursor directly returns a SERVFAIL.
> Recursive queries for most other domains return correct answers -
> two
> domains (acegroup.cc and hivelocity.net) are the only ones that I've
> across that exhibit this behavior. Lookups for those two domains from
> http://dnsstuff.com/ appear normal as well.
> I can reproduce this on the following systems:
>   System 1 - RHEL 3, pdns_server 2.9.20 (static RPM from powerdns.com)
> recursing to pdns_recursor 3.1.2 (generic RPM from powerdns.com)
>   System 2 - RHEL 3, pdns_server 2.9.20 (static RPM from powerdns.com)
> recursing to pdns_recursor 2.9.20 (build from source, gcc 4.0.2)
> Both systems have identical configuration files (except for IP address
> binding), using the bind backend, and do not appear to exhibit any
> problems
> with authoritative queries, only recursive.
> Anyone have any suggestions as to what is happening here? Could there
be a
> bug somewhere in the recursion routines of pdns_server? Am I making
> completely stupid mistake somewhere? I'm out of answers - any help
> be
> greatly appreciated.
> Thanks
> Kirk
> _______________________________________________
> Pdns-users mailing list
> Pdns-users at mailman.powerdns.com
> http://mailman.powerdns.com/mailman/listinfo/pdns-users

More information about the Pdns-users mailing list