[Pdns-users] Recursion issue--SERVFAIL then NOERROR totally at random

Todd Smith TSmith at ninestarconnect.com
Tue Sep 9 16:20:38 UTC 2014


Hey guys,

I've been having a problem with recursion. For some reason, certain domains seem to throw SERVFAIL errors when dug most of the time, but then NOERROR with a correct response at other random times. For example:

root at yoshi:/# dig toyotasupplier.com

; <<>> DiG 9.8.4-rpz2+rl005.12-P1 <<>> toyotasupplier.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 2636
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;toyotasupplier.com.            IN      A

;; Query time: 0 msec
;; SERVER: 208.88.248.25#53(208.88.248.25)
;; WHEN: Wed Sep  3 13:36:33 2014
;; MSG SIZE  rcvd: 36

And then, a few hours later:

root at yoshi:/# dig toyotasupplier.com

; <<>> DiG 9.8.4-rpz2+rl005.12-P1 <<>> toyotasupplier.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 56751
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;toyotasupplier.com.            IN      A

;; ANSWER SECTION:
toyotasupplier.com.     18296   IN      A       12.169.52.71

;; Query time: 1 msec
;; SERVER: 208.88.248.25#53(208.88.248.25)
;; WHEN: Thu Sep  4 10:39:38 2014
;; MSG SIZE  rcvd: 52

And then, a few hours later still:

root at yoshi:/# dig toyotasupplier.com

; <<>> DiG 9.8.4-rpz2+rl005.12-P1 <<>> toyotasupplier.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 5171
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;toyotasupplier.com.            IN      A

;; Query time: 3017 msec
;; SERVER: 208.88.248.25#53(208.88.248.25)
;; WHEN: Fri Sep  5 07:50:25 2014
;; MSG SIZE  rcvd: 36

All without making a single change.

I have been working on debugging this for two days now and absolutely cannot pinpoint a source for the issue. I've increased the max query lengths, the recursor's network and client TCP timeouts, restarted the service several times on several of our DNS servers, and nothing I do seems to fix it. It of course doesn't help that the bug is a bit of a gremlin and keeps mischievously disappearing at random (and in fact never, to my knowledge, happened before until about a week ago, when it started to occur for no apparent reason). Any idea on what could be causing this? FWIW, when I run dig toyotasupplier.com ns it consistently works fine:

root at yoshi:/# dig toyotasupplier.com ns

; <<>> DiG 9.8.4-rpz2+rl005.12-P1 <<>> toyotasupplier.com ns
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 39522
;; flags: qr rd ra; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;toyotasupplier.com.            IN      NS

;; ANSWER SECTION:
toyotasupplier.com.     50741   IN      NS      gslb-ns2.toyota-na.com.
toyotasupplier.com.     50741   IN      NS      gslb-ns1.toyota-na.com.

;; Query time: 1 msec
;; SERVER: 208.88.248.25#53(208.88.248.25)
;; WHEN: Fri Sep  5 07:49:29 2014
;; MSG SIZE  rcvd: 92

Many thanks in advance,

Todd W. Smith
IP Services Technician
2331 East 600 North
Greenfield, IN 46140
(317) 323-2021
tsmith at ninestarconnect.com<mailto:tsmith at ninestarconnect.com>
www.ninestarconnect.com<http://www.ninestarconnect.com/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.powerdns.com/pipermail/pdns-users/attachments/20140909/159e4e3c/attachment.html>


More information about the Pdns-users mailing list