[Pdns-users] RE: Recursion failing on certain records?

Darren Gamble darren.gamble at sjrb.ca
Wed Aug 23 15:55:06 UTC 2006


Hi Kirk,

> I think I maybe went a little too complicated on my explanation here,
and
> missed a couple semi-crucial details. We are *not* the authoritative
> servers
> for the two domains that are giving problems (acegroup.cc and
> hivelocity.net).

No, I understood this.  I had simply pointed out that when you said you
queried the "authoritative server", you were actually querying your
cache.  Note the "@localhost" in your query.  You should have queried
for the NS records for the domain, and directed your queries there.

> Darren - you said that there was something in the configuration of
these
> two
> domains that would fail in pre-3.2.1 versions of the recursor. Can you
> give
> a bit more detail on that?

The whole explanation is a bit complicated, but basically because the
SLD servers have different NS records with different TTLs than the
authoritative servers, older pdns servers will mash the two record sets
together into a single name, with different TTLs.  This is verboten by
RFC, and causes a problem- it causes the record to change as names with
lower TTLs expire.  If this leaves with you with only NS record(s) that
don't respond- which is the issue here- then the recursor is not able to
look up names on that domain anymore.  This is also why the problem is
intermittent- when all of the records expire, the cache goes back up to
the SLD servers and the process starts over again.

The zone is definitely not configured as it should be, but it should
still work.  They shouldn't have registered servers that aren't
responding.

In 3.1.2+, pdns will just replace the record set with the authoritative
server's.  This fixes the problem, and is consistant with other caching
software.

Again, if you were to post the NS records for the domain on each server
when you have this problem, this will tell you for certain if this is
the case.  This should be Step 1 of troubleshooting any DNS issue like
this.

> My question is - is there something in the code for pdns_server
(2.9.20
> and
> 2.9.21 snapshot at least),

This zone, and many others like it, should resolve properly if you
upgrade all of your caches to at least 3.1.2.  That is what you need to
do.  3.1.3 should be released soon too, which contains some important
crash fixes too.

============================
Darren Gamble
Planner, Regional Services
Shaw Cablesystems GP
630 - 3rd Avenue SW
Calgary, Alberta, Canada
T2P 4L4
(403) 781-4948


> 
> -----Original Message-----
> From: Kirk Friggstad [mailto:friggstadk at ironsolutions.com]
> Sent: Tuesday, August 22, 2006 11:52 AM
> To: 'pdns-users at mailman.powerdns.com'
> Subject: Recursion failing on certain records?
> 
> Greetings all:
> 
> I've been puzzling through some strangeness in our PowerDNS
installations
> here. Recursive queries for certain records/domains have been failing
> consistently for a number of weeks - two examples are:
> 	mail.acegroup.cc
> 	mail.hivelocity.net
> 
> If I query the authoritative server, I get a SERVFAIL:
>   $ dig @localhost mail.hivelocity.net
>   ; <<>> DiG 9.2.4 <<>> @localhost mail.hivelocity.net
>   ; (1 server found)
>   ;; global options:  printcmd
>   ;; Got answer:
>   ;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 49833
>   ;; flags: qr rd; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0
> 
>   ;; QUESTION SECTION:
>   ;mail.hivelocity.net.           IN      A
> 
>   ;; Query time: 1 msec
>   ;; SERVER: 127.0.0.1#53(127.0.0.1)
>   ;; WHEN: Tue Aug 22 11:10:21 2006
>   ;; MSG SIZE  rcvd: 37
> 
> but if I query the 3.1.2 recursor directly, I get the correct answer:
>   $ dig @localhost -p 4754 mail.hivelocity.net
>   ; <<>> DiG 9.2.4 <<>> @localhost -p 4754 mail.hivelocity.net
>   ; (1 server found)
>   ;; global options:  printcmd
>   ;; Got answer:
>   ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 1932
>   ;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0
> 
>   ;; QUESTION SECTION:
>   ;mail.hivelocity.net.           IN      A
> 
>   ;; ANSWER SECTION:
>   mail.hivelocity.net.    300     IN      A       66.96.80.16
> 
>   ;; Query time: 206 msec
>   ;; SERVER: 127.0.0.1#4754(127.0.0.1)
>   ;; WHEN: Tue Aug 22 11:10:04 2006
>   ;; MSG SIZE  rcvd: 53
> 
> Querying a 2.9.20 recursor directly returns a SERVFAIL.
> 
> Recursive queries for most other domains return correct answers -
these
> two
> domains (acegroup.cc and hivelocity.net) are the only ones that I've
come
> across that exhibit this behavior. Lookups for those two domains from
> http://dnsstuff.com/ appear normal as well.
> 
> I can reproduce this on the following systems:
>   System 1 - RHEL 3, pdns_server 2.9.20 (static RPM from powerdns.com)
> recursing to pdns_recursor 3.1.2 (generic RPM from powerdns.com)
>   System 2 - RHEL 3, pdns_server 2.9.20 (static RPM from powerdns.com)
> recursing to pdns_recursor 2.9.20 (build from source, gcc 4.0.2)
> 
> Both systems have identical configuration files (except for IP address
> binding), using the bind backend, and do not appear to exhibit any
> problems
> with authoritative queries, only recursive.
> 
> Anyone have any suggestions as to what is happening here? Could there
be a
> bug somewhere in the recursion routines of pdns_server? Am I making
some
> completely stupid mistake somewhere? I'm out of answers - any help
would
> be
> greatly appreciated.
> 
> Thanks
> 
> Kirk
> 
> _______________________________________________
> Pdns-users mailing list
> Pdns-users at mailman.powerdns.com
> http://mailman.powerdns.com/mailman/listinfo/pdns-users


More information about the Pdns-users mailing list