[Pdns-users] PowerDNS Recursor server stopped resolving about half of all domains last night; I built a new server and it's doing the same thing
Nicholas Williams
nicholas at nicholaswilliams.net
Sun Dec 29 01:47:06 UTC 2024
Hello,
I have an existing PowerDNS Recursor 4.0.4 server running on Debian Jessie 8 (I know, I know, out of date ... I'm getting to that). It handles all DNS requests for my home lab network. It has a fairly simple config and has worked without interruption for literally years at a time. It also is configured to validate and successfully validates all DNSSEC.
Last night, shortly after midnight, it stopped resolving about half of all domains worldwide, returning `SERVFAIL` for them. Sometimes it will resolve the primary domain (such as `athenahealth.com`) but not a subdomain (such as `20785-1.portal.athenahealth.com`). Sometimes it will not resolve the primary domain (such as `serverfault.com` or `askubuntu.com`). I haven't been able to find any pattern, and no matter how I've mucked with my config (including turning DNSSEC completely off), it doesn't fix the problem.
My next thought was that I needed to upgrade PowerDNS Recursor, but I couldn't because of how old my DNS server was. So, I built out a brand new server running PowerDNS Recursor 5.1.3 on Ubuntu 24.04.1. Again, the config is simple. Here's the primary file:
```
$ cat /etc/powerdns/recursor.conf
dnssec:
# validation: process # default
trustanchorfile: /usr/share/dns/root.key
recursor:
hint_file: /usr/share/dns/root.hints
include_dir: /etc/powerdns/recursor.d
#incoming:
# listen:
# - 127.0.0.1 # default
#outgoing:
# source_address:
# - 0.0.0.0 # default
```
And here's a file in `recursor.d`:
```
$ cat /etc/powerdns/recursor.d/me.yml
dnssec:
validation: off # validate
# log_bogus: true
incoming:
listen:
- 10.20.30.76:53
logging:
common_errors: true
facility: 1
loglevel: 6
quiet: true
trace: fail
recursor:
auth_zones:
- zone: my-domain-1.com
file: /etc/powerdns/my-domain-1.com.zone
forward_zones:
- zone: my-domain-2.com
forwarders:
- 10.20.31.2
setgid: pdns
setuid: pdns
socket_dir: /var/run
write_pid: true
webservice:
address: 10.20.30.76
allow_from:
- 10.20.30.0/24
- 172.24.52.0/24
api_key: loremipsum
password: foobarbazqux
port: 8080
```
This config is identical to my old PowerDNS Recursor config except that DNSSEC is disabled to try to get it to work. If I manually `dig` (I love `dig`) `askubuntu.com` from the root up, I easily find an answer:
```
# Using i.root-servers.net is 192.36.148.17
$ dig @192.36.148.17 com NS
; <<>> DiG 9.10.6 <<>> @192.36.148.17 com NS
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 2217
;; flags: qr rd ra ad; QUERY: 1, ANSWER: 13, AUTHORITY: 0, ADDITIONAL: 21
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 512
;; QUESTION SECTION:
;com. IN NS
;; ANSWER SECTION:
com. 136670 IN NS d.gtld-servers.net.
com. 136670 IN NS c.gtld-servers.net.
com. 136670 IN NS k.gtld-servers.net.
com. 136670 IN NS f.gtld-servers.net.
com. 136670 IN NS i.gtld-servers.net.
com. 136670 IN NS b.gtld-servers.net.
com. 136670 IN NS l.gtld-servers.net.
com. 136670 IN NS a.gtld-servers.net.
com. 136670 IN NS e.gtld-servers.net.
com. 136670 IN NS m.gtld-servers.net.
com. 136670 IN NS j.gtld-servers.net.
com. 136670 IN NS h.gtld-servers.net.
com. 136670 IN NS g.gtld-servers.net.
;; ADDITIONAL SECTION:
b.gtld-servers.net. 43604 IN A 192.33.14.30
b.gtld-servers.net. 71837 IN AAAA 2001:503:231d::2:30
l.gtld-servers.net. 44115 IN A 192.41.162.30
l.gtld-servers.net. 74612 IN AAAA 2001:500:d937::30
a.gtld-servers.net. 59944 IN A 192.5.6.30
a.gtld-servers.net. 52029 IN AAAA 2001:503:a83e::2:30
e.gtld-servers.net. 11582 IN A 192.12.94.30
e.gtld-servers.net. 63219 IN AAAA 2001:502:1ca1::30
m.gtld-servers.net. 27782 IN A 192.55.83.30
m.gtld-servers.net. 50020 IN AAAA 2001:501:b1f9::30
j.gtld-servers.net. 39663 IN A 192.48.79.30
h.gtld-servers.net. 79936 IN A 192.54.112.30
g.gtld-servers.net. 57527 IN A 192.42.93.30
g.gtld-servers.net. 63219 IN AAAA 2001:503:eea3::30
d.gtld-servers.net. 44435 IN A 192.31.80.30
d.gtld-servers.net. 10633 IN AAAA 2001:500:856e::30
c.gtld-servers.net. 50185 IN A 192.26.92.30
k.gtld-servers.net. 32146 IN A 192.52.178.30
i.gtld-servers.net. 48002 IN A 192.43.172.30
i.gtld-servers.net. 27967 IN AAAA 2001:503:39c1::30
$ dig @192.33.14.30 askubuntu.com NS
; <<>> DiG 9.10.6 <<>> @192.33.14.30 askubuntu.com NS
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 46168
;; flags: qr rd ra; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 13
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 512
;; QUESTION SECTION:
;askubuntu.com. IN NS
;; ANSWER SECTION:
askubuntu.com. 86400 IN NS sureena.ns.cloudflare.com.
askubuntu.com. 86400 IN NS damian.ns.cloudflare.com.
;; ADDITIONAL SECTION:
damian.ns.cloudflare.com. 48087 IN A 172.64.35.50
damian.ns.cloudflare.com. 48087 IN A 162.159.44.50
damian.ns.cloudflare.com. 48087 IN A 108.162.195.50
damian.ns.cloudflare.com. 13178 IN AAAA 2803:f800:50::6ca2:c332
damian.ns.cloudflare.com. 13178 IN AAAA 2606:4700:58::a29f:2c32
damian.ns.cloudflare.com. 13178 IN AAAA 2a06:98c1:50::ac40:2332
sureena.ns.cloudflare.com. 38809 IN A 108.162.194.126
sureena.ns.cloudflare.com. 38809 IN A 172.64.34.126
sureena.ns.cloudflare.com. 38809 IN A 162.159.38.126
sureena.ns.cloudflare.com. 32427 IN AAAA 2a06:98c1:50::ac40:227e
sureena.ns.cloudflare.com. 32427 IN AAAA 2803:f800:50::6ca2:c27e
sureena.ns.cloudflare.com. 32427 IN AAAA 2606:4700:50::a29f:267e
$ dig @172.64.35.50 askubuntu.com A
; <<>> DiG 9.10.6 <<>> @172.64.35.50 askubuntu.com A
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 35705
;; flags: qr rd ra; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 1
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 512
;; QUESTION SECTION:
;askubuntu.com. IN A
;; ANSWER SECTION:
askubuntu.com. 300 IN A 172.64.150.156
askubuntu.com. 300 IN A 104.18.37.100
```
Perfect. But if I ask either my existing PowerDNS Recursor 4.0.4 server or my new PowerDNS Recursor 5.1.3 server, I get `SERVFAIL`:
```
$ dig @10.20.30.76 askubuntu.com A
; <<>> DiG 9.10.6 <<>> @10.20.30.76 askubuntu.com A
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 58213
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 512
; OPT=15: 00 16 64 65 6c 65 67 61 74 69 6f 6e 20 63 6f 6d ("..delegation com")
;; QUESTION SECTION:
;askubuntu.com. IN A
```
The `OPT=15` line with some kind of signature plus `delegation com` is interesting. It's not happening on every domain that's failing to resolve, so it might be a red herring (and it changes ... like running that same query again resulted in `OPT=15: 00 16 64 65 6c 65 67 61 74 69 6f 6e 20 61 73 6b 75 62 75 6e 74 75 2e 63 6f 6d ("..delegation askubuntu.com")`).
Here is the PowerDNS Recursor 5.1.3 fail trace for a failed lookup of `askubuntu.com`: https://gist.github.com/beamerblvd/d8fa24bdf1037e2a670f8e331b7e4905
FWIW, I'm on Comcast Business Class with a 5-address static IP delegation.
What am I doing wrong?
Thanks,
Nick
More information about the Pdns-users
mailing list