[Pdns-users] RE: Recursion failing on certain records?

Kirk Friggstad friggstadk at ironsolutions.com
Wed Aug 23 00:31:42 UTC 2006


I think I maybe went a little too complicated on my explanation here, and
missed a couple semi-crucial details. We are *not* the authoritative servers
for the two domains that are giving problems (acegroup.cc and
hivelocity.net). The domains appear to resolve normally when pdns_recursor
3.1.2 is queried. The problems come in when recursive querying pdns_server
2.9.20 configured to forward recursive queries to the 3.1.2 recursor, or
when querying the 2.9.20 recursor directly. I've also tried out the 2.9.21
snapshot of pdns_server, with similar results.

Darren - you said that there was something in the configuration of these two
domains that would fail in pre-3.2.1 versions of the recursor. Can you give
a bit more detail on that?

My question is - is there something in the code for pdns_server (2.9.20 and
2.9.21 snapshot at least), in handling the forwarding of recursive calls,
that is failing in a similar way that the older versions of the recursor? If
so, is there a quick'n'easy (or quick'n'dirty) workaround for it, or is this
a more involved issue? If it's not an easy fix, I need to revisit the way we
currently have our DNS services set up, and implement one or more standalone
3.1.2 pdns_recursor installations, instead of our current implementation
(all request, recursive or authoritative, passed to 2.9.20 pdns_server
installations with pdns_recursors running on alternate ports).

Bert, do you still need that tcpdump output that you asked for earlier, or
something else to track down the problem? I don't believe this is anything
inherent to our setup, but to the way these two domains (again, domains that
we are not authoritative for and have no control over) are set up.

Thanks to Bert and Darren for their help and feedback on this issue, and I
hope that I'm making better sense this time through. Please let me know if
there's any further clarification or explanation needed.

Kirk


-----Original Message-----
From: Kirk Friggstad [mailto:friggstadk at ironsolutions.com] 
Sent: Tuesday, August 22, 2006 11:52 AM
To: 'pdns-users at mailman.powerdns.com'
Subject: Recursion failing on certain records?

Greetings all:

I've been puzzling through some strangeness in our PowerDNS installations
here. Recursive queries for certain records/domains have been failing
consistently for a number of weeks - two examples are:
	mail.acegroup.cc
	mail.hivelocity.net

If I query the authoritative server, I get a SERVFAIL:
  $ dig @localhost mail.hivelocity.net
  ; <<>> DiG 9.2.4 <<>> @localhost mail.hivelocity.net
  ; (1 server found)
  ;; global options:  printcmd
  ;; Got answer:
  ;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 49833
  ;; flags: qr rd; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0

  ;; QUESTION SECTION:
  ;mail.hivelocity.net.           IN      A

  ;; Query time: 1 msec
  ;; SERVER: 127.0.0.1#53(127.0.0.1)
  ;; WHEN: Tue Aug 22 11:10:21 2006
  ;; MSG SIZE  rcvd: 37

but if I query the 3.1.2 recursor directly, I get the correct answer:
  $ dig @localhost -p 4754 mail.hivelocity.net
  ; <<>> DiG 9.2.4 <<>> @localhost -p 4754 mail.hivelocity.net
  ; (1 server found)
  ;; global options:  printcmd
  ;; Got answer:
  ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 1932
  ;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0

  ;; QUESTION SECTION:
  ;mail.hivelocity.net.           IN      A

  ;; ANSWER SECTION:
  mail.hivelocity.net.    300     IN      A       66.96.80.16

  ;; Query time: 206 msec
  ;; SERVER: 127.0.0.1#4754(127.0.0.1)
  ;; WHEN: Tue Aug 22 11:10:04 2006
  ;; MSG SIZE  rcvd: 53

Querying a 2.9.20 recursor directly returns a SERVFAIL.

Recursive queries for most other domains return correct answers - these two
domains (acegroup.cc and hivelocity.net) are the only ones that I've come
across that exhibit this behavior. Lookups for those two domains from
http://dnsstuff.com/ appear normal as well.

I can reproduce this on the following systems:
  System 1 - RHEL 3, pdns_server 2.9.20 (static RPM from powerdns.com)
recursing to pdns_recursor 3.1.2 (generic RPM from powerdns.com)
  System 2 - RHEL 3, pdns_server 2.9.20 (static RPM from powerdns.com)
recursing to pdns_recursor 2.9.20 (build from source, gcc 4.0.2)

Both systems have identical configuration files (except for IP address
binding), using the bind backend, and do not appear to exhibit any problems
with authoritative queries, only recursive.

Anyone have any suggestions as to what is happening here? Could there be a
bug somewhere in the recursion routines of pdns_server? Am I making some
completely stupid mistake somewhere? I'm out of answers - any help would be
greatly appreciated.

Thanks

Kirk



More information about the Pdns-users mailing list